<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>R-bloggers</title>
	<atom:link href="https://www.r-bloggers.com/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.r-bloggers.com</link>
	<description>R news and tutorials contributed by hundreds of R bloggers</description>
	<lastBuildDate>Wed, 24 Jun 2026 16:00:00 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.5.18</generator>

<image>
	<url>https://i0.wp.com/www.r-bloggers.com/wp-content/uploads/2016/08/cropped-R_single_01-200.png?fit=32%2C32&#038;ssl=1</url>
	<title>R-bloggers</title>
	<link>https://www.r-bloggers.com</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">11524731</site>	<item>
		<title>SIM2 climate data</title>
		<link>https://www.r-bloggers.com/2026/06/sim2-climate-data/</link>
		
		<dc:creator><![CDATA[Michael]]></dc:creator>
		<pubDate>Wed, 24 Jun 2026 16:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://r.iresmi.net/posts/2026/sim2/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>Temperature change in France – CC-BY by University of Reading</p>
<p>As France enters its second heatwave of 2026, can we produce more detailed plots than the excellent visualizations provided by ShowYourStripes?<br />
MétéoFrance offers its monthly SI...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/sim2-climate-data/">SIM2 climate data</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://r.iresmi.net/posts/2026/sim2/"> r.iresmi.net</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 






<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://showyourstripes.info/b/europe/france/all" rel="nofollow" target="_blank"><img src="https://i0.wp.com/r.iresmi.net/posts/2026/sim2/images/EUROPE-France-_3CAll_20of_20France_3E-1850-2025-BK.png?w=578&#038;ssl=1" class="preview-image img-fluid figure-img" alt="A plot of temperature anomaly in France as blue and red stripes" data-recalc-dims="1"></a></p>
<figcaption>Temperature change in France – CC-BY by University of Reading</figcaption>
</figure>
</div>
<p>As France enters its second heatwave of 2026, can we produce more detailed plots than the excellent visualizations provided by <a href="https://showyourstripes.info/" rel="nofollow" target="_blank">ShowYourStripes</a>?</p>
<p>MétéoFrance offers its <a href="https://meteo.data.gouv.fr/datasets/65e040c50a5c6872ebebc711" rel="nofollow" target="_blank">monthly SIM2 dataset</a> albeit over a shorter time span (currently 1970–2025). The dataset includes temperature, precipitation and other variables on an 8 km resolution grid.</p>
<p>We will select a French city, retrieve its geographic coordinates, build the grid for a specific month over the 1970–2025 period, extract the data from the grid at that location and plot the temperature anomaly.</p>
<p>We will use {terra} to create the grid from the tabular files containing cell centers and weather variables, and <code>terra::extract()</code> to get all temperatures.</p>
<div class="cell">
<pre>library(tidyverse)
library(fs)
library(janitor)
library(osmdata)
library(sf)
library(terra)
library(glue)
library(memoise)

invisible(
  Sys.setlocale(category = &quot;LC_ALL&quot;, 
                locale = &quot;en_GB.UTF8&quot;))


#' Geolocate using OSM (Nominatim API)
#'
#' using first result
#' Memoized function
#'
#' @param location (char): place name (geocodable via OSM)
#'
#' @returns (SpatVector): first result
geolocate &lt;- memoise(function(location) {
  loc &lt;- getbb(location, format_out = &quot;sf_polygon&quot;)  |&gt; 
    st_point_on_surface() |&gt;
    slice_head(n = 1)
  
  message(glue(&quot;{loc$display_name} — WGS84 : {loc$geometry}&quot;))
  
  return(loc|&gt;
           st_transform(&quot;IGNF:NTFLAMB2E&quot;) |&gt;
           as(&quot;SpatVector&quot;))
})

#' Generate a monthly temperature chart since 1970
#'
#' @param sim2 (data.frame): Météo-France SIM2 data over the period
#' @param month (char): month number &quot;01&quot;...&quot;12&quot;
#' @param location (char): place name (geocodable via OSM); memoized
#' @param output_dir (char): directory path where a PNG file will be written, if not NULL
#'
#' @returns (ggplot and optionally a file on disk)
generate_chart &lt;- function(
    sim2,
    month,
    location,
    output_dir = NULL) {
  stopifnot(month %in% sprintf(&quot;%02d&quot;, 1:12))
  month_name &lt;- format(ymd(glue(&quot;0000-{month}-01&quot;)), &quot;%B&quot;)
  
  sim2_raster &lt;- sim2 |&gt;
    filter(str_detect(date, glue(&quot;{month}$&quot;))) |&gt;
    mutate(
      x = lambx * 100,
      y = lamby * 100,
      layer = date,
      temp = t,
      .keep = &quot;none&quot;) |&gt;
    rast(
      type = &quot;xylz&quot;,
      crs = &quot;IGNF:NTFLAMB2E&quot;)
  
  loc &lt;- geolocate(location)
  
  temperatures &lt;- sim2_raster |&gt;
    terra::extract(loc) |&gt;
    select(-ID) |&gt;
    pivot_longer(
      cols = everything(),
      names_to = &quot;month&quot;,
      values_to = &quot;temperature&quot;) |&gt;
    mutate(
      year = as.integer(str_sub(month, 1, 4)),
      anomaly = temperature - mean(temperature[year &gt;= 1991 & year &lt;= 2020],
                                   na.rm = TRUE))
  
  p &lt;- temperatures |&gt;
    ggplot(aes(year, anomaly)) +
    geom_col(aes(fill = anomaly)) +
    geom_smooth(method = &quot;loess&quot;,
                formula = y ~ x) +
    scale_fill_gradient2(
      high = scales::muted(&quot;red&quot;),
      mid = &quot;white&quot;,
      low = scales::muted(&quot;blue&quot;)) +
    scale_x_continuous(breaks = scales::breaks_pretty()) +
    scale_y_continuous(breaks = scales::breaks_pretty()) +
    labs(
      title = glue(&quot;Average monthly anomaly temperature — {month_name}&quot;),
      subtitle = location,
      x = &quot;year&quot;,
      y = &quot;departure from average* (°C)&quot;,
      fill = &quot;°C&quot;,
      caption = glue(
        &quot;https://r.iresmi.net/ — {Sys.Date()}
         data: Météo-France SIM2 — *baseline: 1991–2020 normal for {month_name}&quot;)) +
    theme(
      text = element_text(family = &quot;Ubuntu&quot;),
      plot.caption = element_text(size = 7))
  
  if (!is.null(output_dir)) {
    dir_create(output_dir)
    
    ggsave(
      glue(&quot;{output_dir}/tm_{month}_{make_clean_names(location)}.png&quot;),
      plot = p,
      width = 20,
      height = 20 / 1.618,
      units = &quot;cm&quot;,
      dpi = 150)
  }
  
  return(p)
}</pre>
</div>
<p>The data is a bunch of compressed CSV.</p>
<div class="cell">
<pre># https://meteo.data.gouv.fr/datasets/65e040c50a5c6872ebebc711
# Climate change data - monthly SIM
# all files MENS_SIM2_*-*.csv.gz
sim2 &lt;- dir_ls(&quot;data&quot;) |&gt;
  read_delim(
    delim = &quot;;&quot;,
    locale = locale(decimal_mark = &quot;.&quot;),
    name_repair = make_clean_names)</pre>
</div>
<p>Now we just call our function.</p>
<div class="cell">
<pre>generate_chart(sim2,
               month = &quot;06&quot;,
               location = &quot;Paris, France&quot;)</pre>
<div class="cell-output-display">
<div id="fig-process" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Bar plot of temperature anomaly in Paris">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-process-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://i2.wp.com/r.iresmi.net/posts/2026/sim2/index_files/figure-html/fig-process-1.png?w=578&#038;ssl=1" class="img-fluid figure-img" style="width:100.0%" alt="Bar plot of temperature anomaly in Paris" data-recalc-dims="1">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-process-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure 1: June temperature in Paris 1970–2025
</figcaption>
</figure>
</div>
</div>
</div>
<p>If we want each month:</p>
<div class="cell">
<pre># for each month
sprintf(&quot;%02d&quot;, 1:12) |&gt;
  map(\(x) generate_chart(sim2,
                           month = x,
                           location = &quot;Grenoble, France&quot;,
                           output_dir = &quot;results&quot;), 
  .progress = TRUE)</pre>
</div>
<p>Note that scales are not constant across plots; if we want to compare month (or places) we should fix the y-axis and the color scale. It’s left as an exercise to the reader if you want to make a nice poster…</p>


<!-- -->


 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://r.iresmi.net/posts/2026/sim2/"> r.iresmi.net</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/sim2-climate-data/">SIM2 climate data</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402233</post-id>	</item>
		<item>
		<title>Set Working Directory in R: setwd() &#038; RStudio GUI Guide</title>
		<link>https://www.r-bloggers.com/2026/06/set-working-directory-in-r-setwd-rstudio-gui-guide/</link>
		
		<dc:creator><![CDATA[Unknown]]></dc:creator>
		<pubDate>Wed, 24 Jun 2026 08:24:57 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://www.r-bloggers.com/?guid=76daad1a014d4de2f77abf2ed8325ea6</guid>

					<description><![CDATA[<p>Learn how to set your working directory in R using setwd() or the RStudio Session menu. Covers getwd(), Windows path errors, and the here() package for dissertation projects.</p>
<p>Read More »</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/set-working-directory-in-r-setwd-rstudio-gui-guide/">Set Working Directory in R: setwd() & RStudio GUI Guide</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.rstudiodatalab.com/2023/10/how-to-set-working-directory-setwd-in-r.html"> RStudioDataLab</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<!-- Explicit post snippet -->
<div hidden="" aria-hidden="true">Learn how to set your working directory in R using setwd() or the RStudio Session menu. Covers getwd(), Windows path errors, and the here() package for dissertation projects.</div>

<a href="https://www.rstudiodatalab.com/2023/10/how-to-set-working-directory-setwd-in-r.html#more" rel="nofollow" target="_blank">Read More »</a>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.rstudiodatalab.com/2023/10/how-to-set-working-directory-setwd-in-r.html"> RStudioDataLab</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/set-working-directory-in-r-setwd-rstudio-gui-guide/">Set Working Directory in R: setwd() & RStudio GUI Guide</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402226</post-id>	</item>
		<item>
		<title>Introduction to Bayesian Multiple Imputation with the rblimp package workshop</title>
		<link>https://www.r-bloggers.com/2026/06/introduction-to-bayesian-multiple-imputation-with-the-rblimp-package-workshop/</link>
		
		<dc:creator><![CDATA[Dariia Mykhailyshyna]]></dc:creator>
		<pubDate>Tue, 23 Jun 2026 12:23:16 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://r-posts.com/?p=19320</guid>

					<description><![CDATA[<p>Join our workshop on Introduction to Bayesian Multiple Imputation with the rblimp package,  which is a part of our workshops for Ukraine series!  Here’s some more info:  Title: Introduction to Bayesian Multiple Imputation with the rblimp package Date: Thursday, July 23rd, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)  Speaker: Ermioni Athanasiadi ...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/introduction-to-bayesian-multiple-imputation-with-the-rblimp-package-workshop/">Introduction to Bayesian Multiple Imputation with the rblimp package workshop</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="http://r-posts.com/introduction-to-bayesian-multiple-imputation-with-the-rblimp-package-workshop/"> R-posts.com</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p><span style="font-weight: 400">Join our workshop on Introduction to Bayesian Multiple Imputation with the rblimp package,</span> <span style="font-weight: 400"> which is a part of our workshops for Ukraine series! </span></p>
<br />
<p><b>Here’s some more info: </b></p>
<br />
<p><b>Title</b><span style="font-weight: 400">: Introduction to Bayesian Multiple Imputation with the rblimp package</span></p>
<p><b>Date</b><span style="font-weight: 400">: Thursday, July 23rd, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone) </span></p>
<p><b>Speaker</b><span style="font-weight: 400">: Ermioni Athanasiadi is a PhD student at the University of Siegen, holding a master’s degree in psychology from University of Tübingen and currently pursuing a Master’s degree in Statistics and Data Science at Hasselt University, Belgium. Her research focuses on missing data methods in small-sample settings.  She is also passionate about interdisciplinary perspectives on statistics and research methodology.</span></p>
<p><b>Description: </b><span style="font-weight: 400">  Multiple Imputation has become the gold standard for handling missing data in applied research, with many researchers making use of the mice package. Blimp is an alternative, flexible software that allows multiple imputation within a fully Bayesian framework. In this workshop, we will cover the basics of multiple imputation and Blimp’s modeling framework. We will then explore how to specify imputation models, incorporate auxiliary variables, assess convergence and post-imputation diagnostics and conduct pooled analyses using multiply imputed datasets.</span></p>
<p><span style="font-weight: 400">The workshop may be helpful both for those who are new to multiple imputation, and for those who have previously used multiple imputation methods and would like to learn about a Bayesian alternative.</span></p>
<p><span style="font-weight: 400">Participants should install the Blimp software beforehand: https://www.appliedmissingdata.com/blimp</span></p>
<p><b>Minimal registration fee:</b><span style="font-weight: 400"> 20 euro (or 20 USD or 800 UAH)</span></p>
<br />
<p><span style="font-weight: 400">Please note that the registration confirmation is sent 1 day before the workshop to all registered participants rather than immediately after registration</span></p>
<br />
<p><b>How can I register?</b></p>
<br />
<ul>
	<li style="font-weight: 400"><span style="font-weight: 400">Go to </span><a href="https://bit.ly/3wvwMA6" rel="nofollow" target="_blank"><span style="font-weight: 400">https://bit.ly/3wvwMA6</span></a><span style="font-weight: 400"> or </span><a href="https://bit.ly/4aD5LMC" rel="nofollow" target="_blank"><span style="font-weight: 400">https://bit.ly/4aD5LMC</span></a><span style="font-weight: 400">  or  </span><a href="https://bit.ly/3PFxtNA" rel="nofollow" target="_blank"><span style="font-weight: 400">https://bit.ly/3PFxtNA</span></a><span style="font-weight: 400"> and donate</span><b> at least 20 euro</b><span style="font-weight: 400">. </span><span style="font-weight: 400">Feel free to donate more if you can, all proceeds go directly to support Ukraine.</span></li>
</ul>
<br />
<ul>
	<li style="font-weight: 400"><span style="font-weight: 400">Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)</span></li>
</ul>
<br />
<ul>
	<li style="font-weight: 400"><span style="font-weight: 400">Fill in the </span><a href="https://forms.gle/TD1BPw1k983cSKBy9" rel="nofollow" target="_blank"><span style="font-weight: 400">registration form</span></a><span style="font-weight: 400">, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).</span></li>
</ul>
<br />
<p><span style="font-weight: 400">If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.</span></p>
<br />
<p><b>How can I sponsor a student?</b></p>
<ul>
	<li style="font-weight: 400"><span style="font-weight: 400">Go to </span><a href="https://bit.ly/3wvwMA6" rel="nofollow" target="_blank"><span style="font-weight: 400">https://bit.ly/3wvwMA6</span></a><span style="font-weight: 400"> or </span><a href="https://bit.ly/4aD5LMC" rel="nofollow" target="_blank"><span style="font-weight: 400">https://bit.ly/4aD5LMC</span></a><span style="font-weight: 400">  or </span><a href="https://bit.ly/3PFxtNA" rel="nofollow" target="_blank"><span style="font-weight: 400">https://bit.ly/3PFxtNA</span></a><span style="font-weight: 400"> and donate </span><b>at least 20 euro </b><span style="font-weight: 400">(or 17 GBP or 20 USD or 800 UAH). </span><span style="font-weight: 400">Feel free to donate more if you can, all proceeds go to support Ukraine!</span></li>
</ul>
<br />
<ul>
	<li style="font-weight: 400"><span style="font-weight: 400">Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)</span></li>
</ul>
<br />
<ul>
	<li style="font-weight: 400"><span style="font-weight: 400">Fill in the </span><a href="https://forms.gle/yYfkiaCcaJ8XYxde6" rel="nofollow" target="_blank"><span style="font-weight: 400">sponsorship form</span></a><span style="font-weight: 400">, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.</span></li>
</ul>
<br />
<br />
<p><span style="font-weight: 400">If you are a university student and cannot afford the registration fee, you can also sign up for the </span><b>waiting list</b> <a href="https://forms.gle/koBfqdf7gAsgrAB26" rel="nofollow" target="_blank"><span style="font-weight: 400">here</span></a><span style="font-weight: 400">. (Note that you are not guaranteed to participate by signing up for the waiting list).</span></p>
<br />
<br />
<p><span style="font-weight: 400">You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings &#038; materials </span><a href="http://bit.ly/3wBeY4S" rel="nofollow" target="_blank"><span style="font-weight: 400">here</span></a><span style="font-weight: 400">.</span></p>
<br />
<p><span style="font-weight: 400">Looking forward to seeing you during the workshop!</span></p><hr style="border-top: black solid 1px" /><a href="http://r-posts.com/introduction-to-bayesian-multiple-imputation-with-the-rblimp-package-workshop/" rel="nofollow" target="_blank">Introduction to Bayesian Multiple Imputation with the rblimp package workshop</a> was first posted on June 23, 2026 at 12:23 pm.<br />
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="http://r-posts.com/introduction-to-bayesian-multiple-imputation-with-the-rblimp-package-workshop/"> R-posts.com</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/introduction-to-bayesian-multiple-imputation-with-the-rblimp-package-workshop/">Introduction to Bayesian Multiple Imputation with the rblimp package workshop</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402209</post-id>	</item>
		<item>
		<title>Flowcharts that belong in the analysis pipeline</title>
		<link>https://www.r-bloggers.com/2026/06/flowcharts-that-belong-in-the-analysis-pipeline/</link>
		
		<dc:creator><![CDATA[Max Gordon]]></dc:creator>
		<pubDate>Tue, 23 Jun 2026 11:25:59 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://gforge.se/?p=2327</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Thanks to Alan Haynes and his excellent suggestions, I have spent some time improving the flowchart component of the Gmisc package. The result is not meant to be another decorative diagram tool. It is meant for the kind of figures … Continue reading →</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/flowcharts-that-belong-in-the-analysis-pipeline/">Flowcharts that belong in the analysis pipeline</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://gforge.se/2026/06/flowcharts-that-belong-in-the-analysis-pipeline/"> R – G-Forge</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<div id="attachment_2332" style="width: 810px" class="wp-caption aligncenter"><a href="https://i2.wp.com/www-static.gforge.se/wp-content/uploads/2026/06/Waterfall_Wasif_Mallk.jpg?ssl=1" rel="nofollow" target="_blank"><img loading="lazy" fetchpriority="high" decoding="async" aria-describedby="caption-attachment-2332" src="https://i2.wp.com/www-static.gforge.se/wp-content/uploads/2026/06/Waterfall_Wasif_Mallk.jpg?w=450&#038;ssl=1" alt="" class="size-full wp-image-2332" srcset_temp="https://i2.wp.com/www-static.gforge.se/wp-content/uploads/2026/06/Waterfall_Wasif_Mallk.jpg?w=450&#038;ssl=1 800w, https://www-static.gforge.se/wp-content/uploads/2026/06/Waterfall_Wasif_Mallk-300x225.jpg 300w, https://www-static.gforge.se/wp-content/uploads/2026/06/Waterfall_Wasif_Mallk-768x576.jpg 768w, https://www-static.gforge.se/wp-content/uploads/2026/06/Waterfall_Wasif_Mallk-400x300.jpg 400w" sizes="(max-width: 800px) 100vw, 800px" data-recalc-dims="1" /></a><p id="caption-attachment-2332" class="wp-caption-text">Flowcharts should be beautiful. Just like this CC photo from <a href="https://flic.kr/p/bGN64z" rel="nofollow" target="_blank">Wasif Malik</a>,</p></div>
<p>Thanks to <a href="https://github.com/aghaynes" rel="nofollow" target="_blank">Alan Haynes</a> and<br />
his excellent suggestions, I have spent some time improving the<br />
flowchart component of the Gmisc package. The result is not meant to be<br />
another decorative diagram tool. It is meant for the kind of figures<br />
researchers keep redrawing by hand: CONSORT diagrams, cohort derivation<br />
charts, screening flows, data-cleaning audit trails, and the small but<br />
important maps that explain how a study population came to be.</p>
<p>I like tools such as <a href="https://excalidraw.com/" rel="nofollow" target="_blank">Excalidraw</a><br />
for thinking. They are fast, expressive, and excellent for<br />
conversations. But when a figure enters a manuscript, the needs change.<br />
Counts must be updated. Exclusions must match the analysis script.<br />
Treatment arms should align. Follow-up losses should be traceable. The<br />
figure should survive reviewer round three without becoming a manual<br />
editing project.</p>
<p>That is the space where <code>flowchart()</code> in Gmisc is useful:<br />
the diagram becomes part of the research workflow.</p>
<p><img decoding="async" src="https://i1.wp.com/www-static.gforge.se/wp-content/uploads/2026/06/01-consort-color.png?w=578&#038;ssl=1"
alt="A colorful CONSORT-style flowchart generated with Gmisc" data-recalc-dims="1" /></p>
<p>The figure above is the kind of chart I want Gmisc to make feel<br />
natural. It is still a grid graphic in R, but it has the visual grammar<br />
of a manuscript figure: grouped arms, side exclusions, count badges,<br />
phase labels, and arrows that do not need nudging after every text<br />
change.</p>
<p>Every figure in this post is generated by code, and the code is<br />
included below each image. They all share the same two-line<br />
preamble:</p>
<pre>
library(Gmisc)
library(grid)
</pre>
<p>To save any of them to a file, wrap the call in a graphics device,<br />
e.g.</p>
<pre>
png(&quot;01-consort-color.png&quot;, width = 9, height = 7, units = &quot;in&quot;, res = 180, bg = &quot;white&quot;)
# ... the flowchart code ...
dev.off()
</pre>
<p>The CONSORT figure above is produced by:</p>
<pre>
options(boxGrobTxtPadding = unit(3, &quot;mm&quot;))

box_fill &lt;- gpar(fill = &quot;#DDEEFF&quot;, col = &quot;#336699&quot;, lwd = 1.5)
con_gp &lt;- gpar(col = &quot;#336699&quot;, lwd = 1.5, fill = &quot;#336699&quot;)
side_gp &lt;- gpar(col = &quot;#CC8800&quot;, lwd = 1.2, fill = &quot;#CC8800&quot;)
excl_fill &lt;- gpar(fill = &quot;#FFF8E1&quot;, col = &quot;#CC8800&quot;, lwd = 1.2)
heading_gp &lt;- gpar(fill = &quot;#C8DAF7&quot;, col = &quot;#2F5F9F&quot;, lwd = 1.1)
badge_gp &lt;- gpar(fill = &quot;#336699&quot;, col = NA)
badge_txt_gp &lt;- gpar(col = &quot;white&quot;, cex = 0.65)

main_arm_margin &lt;- 0.28
main_x &lt;- 0.5
exclusion_margin &lt;- 0.05

grid.newpage()
flowchart(
  assessed = boxGrob(
    &quot;Patients assessed for eligibility&quot;,
    x = main_x, box_gp = box_fill,
    badge_label = &quot;840&quot;, badge_gp = badge_gp, badge_txt_gp = badge_txt_gp
  ),
  randomised = boxGrob(
    &quot;Randomised&quot;,
    x = main_x, box_gp = box_fill,
    badge_label = &quot;126&quot;, badge_gp = badge_gp, badge_txt_gp = badge_txt_gp
  ),
  arms = list(
    cast = boxGrob(
      &quot;Randomised to\ncast immobilisation&quot;,
      box_gp = box_fill,
      badge_label = &quot;62&quot;, badge_gp = badge_gp, badge_txt_gp = badge_txt_gp
    ),
    surgical = boxGrob(
      &quot;Randomised to\nsurgery&quot;,
      box_gp = box_fill,
      badge_label = &quot;64&quot;, badge_gp = badge_gp, badge_txt_gp = badge_txt_gp
    )
  ),
  lost = list(
    lost_cast = boxGrob(
      &quot;Lost to follow-up (n = 2)\n  1 no response\n  1 other surgery&quot;,
      just = &quot;left&quot;, box_gp = excl_fill
    ),
    lost_surgical = boxGrob(
      &quot;Lost to follow-up (n = 3)\n  2 no response\n  1 other surgery&quot;,
      just = &quot;left&quot;, box_gp = excl_fill
    )
  ),
  analysis = list(
    analysis_cast = boxGrob(
      &quot;Included in\nprimary analysis&quot;,
      box_gp = box_fill,
      badge_label = &quot;60&quot;, badge_gp = badge_gp, badge_txt_gp = badge_txt_gp
    ),
    analysis_surgical = boxGrob(
      &quot;Included in\nprimary analysis&quot;,
      box_gp = box_fill,
      badge_label = &quot;61&quot;, badge_gp = badge_gp, badge_txt_gp = badge_txt_gp
    )
  )
) |&gt;
  spread(axis = &quot;y&quot;, margin = unit(5, &quot;mm&quot;), exclude = &quot;lost&quot;) |&gt;
  align(
    axis = &quot;y&quot;,
    subelement = &quot;lost&quot;,
    references = list(&quot;arms&quot;, &quot;analysis&quot;)
  ) |&gt;
  equalizeWidths(subelement = list(&quot;arms&quot;, &quot;analysis&quot;)) |&gt;
  spread(axis = &quot;x&quot;, subelement = &quot;arms&quot;, margin = main_arm_margin) |&gt;
  spread(axis = &quot;x&quot;, subelement = &quot;analysis&quot;, margin = main_arm_margin) |&gt;
  spread(axis = &quot;x&quot;, subelement = &quot;lost&quot;, margin = exclusion_margin) |&gt;
  phaseLabel(&quot;arms&quot;, &quot;Allocation&quot;, box_gp = heading_gp) |&gt;
  phaseLabel(&quot;analysis&quot;, &quot;Analysis&quot;, box_gp = heading_gp) |&gt;
  insert(list(excluded = boxGrob(
    &quot;Excluded (n = 714)\n  477 stable ankle mortise\n   64 incongruent ankle mortise\n   30 previous serious trauma\n  143 other reasons&quot;,
    just = &quot;left&quot;, box_gp = excl_fill
  )), after = &quot;assessed&quot;) |&gt;
  move(subelement = &quot;excluded&quot;, x = 1 - exclusion_margin, just = &quot;right&quot;) |&gt;
  align(
    axis = &quot;y&quot;,
    subelement = &quot;excluded&quot;,
    references = list(&quot;assessed&quot;, &quot;randomised&quot;)
  ) |&gt;
  connect(&quot;assessed&quot;, &quot;excluded&quot;, type = &quot;L&quot;, lty_gp = side_gp, arrow_size = 3, smooth = TRUE) |&gt;
  connect(&quot;randomised&quot;, &quot;arms&quot;, type = &quot;N&quot;, lty_gp = con_gp, arrow_size = 3, smooth = TRUE) |&gt;
  connect(&quot;assessed&quot;, &quot;randomised&quot;, type = &quot;v&quot;, lty_gp = con_gp, arrow_size = 3, smooth = TRUE) |&gt;
  connect(&quot;arms&quot;, &quot;lost&quot;, type = &quot;L&quot;, lty_gp = side_gp, arrow_size = 3, smooth = TRUE) |&gt;
  connect(&quot;arms&quot;, &quot;analysis&quot;, type = &quot;v&quot;, lty_gp = con_gp, arrow_size = 3) |&gt;
  print()
</pre>
<h2 id="a-figure-that-can-change-with-the-analysis">A figure that can<br />
change with the analysis</h2>
<p>The biggest advantage of drawing a flowchart in code is not that code<br />
is elegant. It is that research figures are rarely finished when we<br />
think they are.</p>
<p>The inclusion count changes after a database refresh. A reviewer asks<br />
for a sensitivity analysis. Someone notices that two exclusion<br />
categories should be split. The statistician reruns the cohort<br />
definition. If the diagram is hand-drawn, every one of those changes<br />
creates a small risk of mismatch between the paper and the actual<br />
analysis.</p>
<p>If the chart is generated, it can sit beside the code that produced<br />
the numbers.</p>
<pre>
flowchart(...) |&gt;
  spread(axis = &quot;y&quot;) |&gt;
  spread(subelement = &quot;arms&quot;, axis = &quot;x&quot;) |&gt;
  connect(&quot;randomised&quot;, &quot;arms&quot;, type = &quot;N&quot;)
</pre>
<p>That is the mental model: define boxes, arrange boxes, connect boxes.<br />
The final result can still be polished, but it remains reproducible.</p>
<h2 id="cohort-derivation-from-data-people-already-have">Cohort<br />
derivation from data people already have</h2>
<p>Most clinical researchers do not start with a perfect trial flow.<br />
They start with a registry extract, an EHR table, a REDCap project, an<br />
Excel sheet from a collaborator, or a combination of all of them.</p>
<p>That workflow deserves a clear figure too.</p>
<p><img decoding="async" src="https://i0.wp.com/www-static.gforge.se/wp-content/uploads/2026/06/02-registry-cohort.png?w=578&#038;ssl=1"
alt="Registry and EHR cohort derivation flowchart" data-recalc-dims="1" /></p>
<p>This kind of diagram is useful because it does not only show who was<br />
included. It shows how the study base was assembled: what sources were<br />
linked, where exclusions entered, and which analytic populations came<br />
out at the end.</p>
<p>I find this especially helpful for observational studies. A table can<br />
report baseline characteristics, but a flowchart explains the<br />
construction of the cohort. It gives the reader a quick answer to: “What<br />
happened between the raw data and the model?”</p>
<pre>
source_gp &lt;- gpar(fill = &quot;#E8F5E9&quot;, col = &quot;#2E7D32&quot;, lwd = 1.4)
link_gp &lt;- gpar(fill = &quot;#E3F2FD&quot;, col = &quot;#1565C0&quot;, lwd = 1.4)
cohort_gp &lt;- gpar(fill = &quot;#FFF8E1&quot;, col = &quot;#C69214&quot;, lwd = 1.4)
side_gp &lt;- gpar(fill = &quot;#FCE4EC&quot;, col = &quot;#AD1457&quot;, lwd = 1.2)
final_gp &lt;- gpar(fill = &quot;#EDE7F6&quot;, col = &quot;#512DA8&quot;, lwd = 1.4)
con_gp &lt;- gpar(col = &quot;#455A64&quot;, fill = &quot;#455A64&quot;, lwd = 1.4)
excl_gp &lt;- gpar(col = &quot;#AD1457&quot;, fill = &quot;#AD1457&quot;, lwd = 1.2)

source_margin &lt;- 0.05
output_margin &lt;- 0.05
main_x &lt;- 0.5
main_path &lt;- list(&quot;linked&quot;, &quot;cohort&quot;)
exclusion_right &lt;- 0.95
exclusion_gap &lt;- unit(5, &quot;pt&quot;)
exclusion_line_offset &lt;- unit(14, &quot;mm&quot;)

grid.newpage()
flowchart(
  sources = list(
    ehr = boxGrob(&quot;Hospital EHR\nadmissions\nn = 241,820&quot;,
                  box_gp = source_gp),
    registry = boxGrob(&quot;Quality registry\nprocedures\nn = 38,420&quot;,
                       box_gp = source_gp),
    deaths = boxGrob(&quot;Population registry\nfollow-up\nn = 100%&quot;,
                     box_gp = source_gp)
  ),
  linked = boxGrob(
    &quot;Linked study base\nunique patients with follow-up\nn = 29,614&quot;,
    x = main_x,
    box_gp = link_gp,
    width = unit(72, &quot;mm&quot;)
  ),
  exclusions = list(
    prior = boxGrob(&quot;Previous diagnosis\nn = 4,108&quot;,
                    just = &quot;left&quot;, box_gp = side_gp,
                    width = unit(42, &quot;mm&quot;)),
    missing = boxGrob(&quot;Missing key\ncovariates\nn = 962&quot;,
                      just = &quot;left&quot;, box_gp = side_gp,
                      width = unit(42, &quot;mm&quot;)),
    outside = boxGrob(&quot;Outside study\nwindow\nn = 1,327&quot;,
                      just = &quot;left&quot;, box_gp = side_gp,
                      width = unit(42, &quot;mm&quot;))
  ),
  cohort = boxGrob(
    &quot;Primary cohort\nn = 23,217&quot;,
    box_gp = cohort_gp,
    width = unit(62, &quot;mm&quot;)
  ),
  outputs = list(
    primary = boxGrob(&quot;Primary analysis\ncomplete case\nn = 22,144&quot;,
                      box_gp = final_gp),
    imputed = boxGrob(&quot;Sensitivity analysis\nmultiple imputation\nn = 23,217&quot;,
                      box_gp = final_gp),
    negative = boxGrob(&quot;Negative control\noutcome check\nn = 21,903&quot;,
                       box_gp = final_gp)
  )
) |&gt;
  spread(axis = &quot;y&quot;, margin = unit(8, &quot;mm&quot;), exclude = &quot;exclusions&quot;) |&gt;
  equalizeWidths(subelement = main_path) |&gt;
  align(axis = &quot;x&quot;, subelement = &quot;cohort&quot;, reference = &quot;linked&quot;) |&gt;
  move(subelement = c(&quot;exclusions&quot;, &quot;prior&quot;),
       y = position(&quot;linked&quot;, position = &quot;center&quot;, type = &quot;y&quot;) - exclusion_gap,
       just = c(NA, &quot;top&quot;)) |&gt;
  move(subelement = c(&quot;exclusions&quot;, &quot;missing&quot;),
       y = position(c(&quot;exclusions&quot;, &quot;prior&quot;), position = &quot;bottom&quot;, type = &quot;y&quot;) - exclusion_gap,
       just = c(NA, &quot;top&quot;)) |&gt;
  move(subelement = c(&quot;exclusions&quot;, &quot;outside&quot;),
       y = position(c(&quot;exclusions&quot;, &quot;missing&quot;), position = &quot;bottom&quot;, type = &quot;y&quot;) - exclusion_gap,
       just = c(NA, &quot;top&quot;)) |&gt;
  equalizeWidths(subelement = &quot;sources&quot;) |&gt;
  equalizeWidths(subelement = &quot;exclusions&quot;, width = unit(42, &quot;mm&quot;)) |&gt;
  equalizeWidths(subelement = &quot;outputs&quot;) |&gt;
  move(subelement = &quot;exclusions&quot;, x = exclusion_right, just = &quot;right&quot;) |&gt;
  spread(axis = &quot;x&quot;, subelement = &quot;sources&quot;, margin = source_margin, type = &quot;center&quot;) |&gt;
  spread(axis = &quot;x&quot;, subelement = &quot;outputs&quot;, margin = output_margin, type = &quot;center&quot;) |&gt;
  connect(&quot;sources&quot;, &quot;linked&quot;, type = &quot;vertical_axis&quot;, lty_gp = con_gp, arrow_size = 3) |&gt;
  connect(&quot;linked&quot;, &quot;cohort&quot;, type = &quot;v&quot;, lty_gp = con_gp, arrow_size = 3, smooth = TRUE) |&gt;
  connect(&quot;linked&quot;, &quot;exclusions&quot;,
          type = &quot;side&quot;, lty_gp = excl_gp, arrow_size = 3,
          side = &quot;right&quot;, end_side = &quot;left&quot;,
          side_route = &quot;outside&quot;,
          side_offset = exclusion_line_offset,
          label = &quot;Excluded\nn = 6,397&quot;,
          label_gp = gpar(col = &quot;#AD1457&quot;, cex = 0.8)) |&gt;
  connect(&quot;cohort&quot;, &quot;outputs&quot;, type = &quot;N&quot;, lty_gp = con_gp, arrow_size = 3, smooth = TRUE) |&gt;
  print()
</pre>
<h2 id="the-audit-trail-is-part-of-the-story">The audit trail is part of<br />
the story</h2>
<p>Another common workflow is less glamorous but just as important: data<br />
validation.</p>
<p><img decoding="async" src="https://i1.wp.com/www-static.gforge.se/wp-content/uploads/2026/06/03-data-cleaning-audit.png?w=578&#038;ssl=1"
alt="Data cleaning and validation flowchart" data-recalc-dims="1" /></p>
<p>Many research projects have a small data-engineering pipeline even<br />
when nobody calls it that. Data arrive through forms, imports, manual<br />
entry, and collaborator spreadsheets. Then someone checks missing<br />
fields, duplicates, impossible dates, inconsistent IDs, and<br />
outliers.</p>
<p>That process is often hidden in prose. A compact flowchart can make<br />
it visible without turning the methods section into a systems manual. It<br />
is also a useful project-management figure: the same chart can be shown<br />
to clinicians, data managers, statisticians, and co-authors.</p>
<p>Note how the box shapes carry meaning here — ellipses, databases,<br />
documents, tapes, and diamonds all come from dedicated<br />
<code>box*Grob()</code> helpers:</p>
<pre>
input_gp &lt;- gpar(fill = &quot;#F3F8FF&quot;, col = &quot;#3B73C5&quot;, lwd = 1.3)
process_gp &lt;- gpar(fill = &quot;#FFF4C7&quot;, col = &quot;#C69214&quot;, lwd = 1.3)
issue_gp &lt;- gpar(fill = &quot;#FCE4EC&quot;, col = &quot;#AD1457&quot;, lwd = 1.2)
output_gp &lt;- gpar(fill = &quot;#E8F5E9&quot;, col = &quot;#2E7D32&quot;, lwd = 1.3)
note_gp &lt;- gpar(fill = &quot;#FFFFFF&quot;, col = &quot;#607D8B&quot;, lwd = 1, lty = 2)
con_gp &lt;- gpar(col = &quot;#555555&quot;, fill = &quot;#555555&quot;, lwd = 1.3)
issue_con_gp &lt;- gpar(col = &quot;#AD1457&quot;, fill = &quot;#AD1457&quot;, lwd = 1.1)

main_path &lt;- list(&quot;validation&quot;, &quot;clean&quot;)
issue_column_x &lt;- 0.08
log_column_x &lt;- 0.92
input_shape_width &lt;- unit(42, &quot;mm&quot;)
input_shape_height &lt;- unit(24, &quot;mm&quot;)
issue_shape_width &lt;- unit(48, &quot;mm&quot;)
issue_shape_height &lt;- unit(14, &quot;mm&quot;)

grid.newpage()
flowchart(
  inputs = list(
    web = boxEllipseGrob(&quot;REDCap\nform&quot;,
                         width = input_shape_width,
                         height = input_shape_height,
                         box_gp = input_gp),
    import = boxDatabaseGrob(&quot;CSV\nimport&quot;,
                             width = input_shape_width,
                             height = input_shape_height,
                             box_gp = input_gp),
    manual = boxDocumentGrob(&quot;Manual\nentry&quot;,
                             width = input_shape_width,
                             height = input_shape_height,
                             box_gp = input_gp)
  ),
  shape_note = boxGrob(
    &quot;Shape indicates\nsource type&quot;,
    just = &quot;left&quot;,
    width = unit(36, &quot;mm&quot;),
    box_gp = note_gp
  ),
  validation = boxTapeGrob(
    &quot;Validation queue\nIDs, dates, ranges, missingness&quot;,
    width = unit(.58, &quot;npc&quot;),
    height = unit(.14, &quot;npc&quot;),
    box_gp = process_gp
  ),
  issues = list(
    missing = boxDiamondGrob(&quot;Missing\nfields&quot;,
                             width = issue_shape_width,
                             height = issue_shape_height,
                             box_gp = issue_gp),
    duplicate = boxDiamondGrob(&quot;Duplicate\nID&quot;,
                               width = issue_shape_width,
                               height = issue_shape_height,
                               box_gp = issue_gp),
    outlier = boxDiamondGrob(&quot;Outlier\nvalue&quot;,
                             width = issue_shape_width,
                             height = issue_shape_height,
                             box_gp = issue_gp)
  ),
  log = boxDocumentsGrob(
    &quot;Issue log\nqueries sent\nchanges reviewed&quot;,
    width = unit(48, &quot;mm&quot;),
    height = unit(.44, &quot;npc&quot;),
    box_gp = issue_gp
  ),
  clean = boxDatabaseGrob(
    &quot;Analysis-ready dataset\nlocked for report&quot;,
    width = unit(.44, &quot;npc&quot;),
    height = unit(.16, &quot;npc&quot;),
    box_gp = output_gp
  )
) |&gt;
  spread(axis = &quot;y&quot;, margin = unit(7, &quot;mm&quot;),
         exclude = list(&quot;issues&quot;, &quot;shape_note&quot;)) |&gt;
  spread(axis = &quot;x&quot;, subelement = &quot;inputs&quot;,
         from = 0, to = 0.7, margin = 0.05,
         type = &quot;center&quot;) |&gt;
  equalizeWidths(subelement = main_path) |&gt;
  align(axis = &quot;x&quot;, subelement = &quot;validation&quot;, reference = &quot;inputs&quot;) |&gt;
  align(axis = &quot;x&quot;, subelement = &quot;clean&quot;, reference = &quot;validation&quot;) |&gt;
  align(axis = &quot;y&quot;, subelement = &quot;shape_note&quot;, reference = &quot;inputs&quot;) |&gt;
  align(axis = &quot;y&quot;, subelement = &quot;log&quot;,
        references = list(&quot;validation&quot;, &quot;clean&quot;)) |&gt;
  spread(axis = &quot;y&quot;, subelement = &quot;issues&quot;,
         from = position(&quot;log&quot;, position = &quot;top&quot;, type = &quot;y&quot;),
         to = position(&quot;log&quot;, position = &quot;bottom&quot;, type = &quot;y&quot;),
         margin = unit(2, &quot;mm&quot;)) |&gt;
  move(subelement = &quot;shape_note&quot;, x = 0.95, just = &quot;right&quot;) |&gt;
  move(subelement = &quot;issues&quot;, x = issue_column_x, just = &quot;left&quot;) |&gt;
  move(subelement = &quot;log&quot;, x = log_column_x, just = &quot;right&quot;) |&gt;
  connect(&quot;inputs&quot;, &quot;validation&quot;,
          type = &quot;vertical_axis&quot;, lty_gp = con_gp, arrow_size = 3) |&gt;
  connect(&quot;issues&quot;, &quot;log&quot;,
          type = &quot;horizontal_axis&quot;, lty_gp = issue_con_gp, arrow_size = 3) |&gt;
  connect(&quot;validation&quot;, &quot;clean&quot;, type = &quot;vertical_axis&quot;,
          lty_gp = con_gp, arrow_size = 3, smooth = TRUE) |&gt;
  print()
</pre>
<h2 id="follow-up-is-rarely-just-down-the-page">Follow-up is rarely just<br />
down the page</h2>
<p>Longitudinal studies often need to distinguish between people who are<br />
lost, censored, withdrawn, dead, or still contributing information up to<br />
a time point. A simple downward flow can imply that everyone leaving a<br />
box disappears from the analysis, which is not always true.</p>
<p><img decoding="async" src="https://i0.wp.com/www-static.gforge.se/wp-content/uploads/2026/06/04-followup-accounting.png?w=578&#038;ssl=1"
alt="Follow-up accounting with dotted return arrows" data-recalc-dims="1" /></p>
<p>Dotted return arrows are useful for this. They can show that a<br />
participant left direct follow-up but still contributes information to<br />
the final analysis up to censoring. That is a visual detail, but it<br />
communicates an analytical idea.</p>
<p>This is where small flowchart improvements matter. Not because the<br />
reader cares about the drawing API, but because the figure can express<br />
the study design more faithfully.</p>
<pre>
options(boxGrobTxtPadding = unit(1, &quot;mm&quot;))

main_gp &lt;- gpar(fill = &quot;#FFFFFF&quot;, col = &quot;#263238&quot;, lwd = 1.2)
arm_gp &lt;- gpar(fill = &quot;#E3F2FD&quot;, col = &quot;#1565C0&quot;, lwd = 1.3)
ex_gp &lt;- gpar(fill = &quot;#FFF8E1&quot;, col = &quot;#C69214&quot;, lwd = 1.2)
con_gp &lt;- gpar(col = &quot;#1565C0&quot;, fill = &quot;#1565C0&quot;, lwd = 1.3)
side_gp &lt;- gpar(col = &quot;#C69214&quot;, fill = &quot;#C69214&quot;, lwd = 1.2)
dotted_gp &lt;- gpar(col = &quot;#455A64&quot;, fill = &quot;#455A64&quot;, lwd = 1.1, lty = 2)

arm_from &lt;- .24
arm_to &lt;- .76
box_width &lt;- unit(54, &quot;mm&quot;)
ex_width &lt;- unit(45, &quot;mm&quot;)
ex_page_margin &lt;- 0.03           # excluded columns hug the page edge by this npc margin
side_offset &lt;- unit(4, &quot;mm&quot;)     # side branches step out this far before turning to the excluded box
fan_in_offset &lt;- unit(2, &quot;mm&quot;)   # dotted return line runs 2 mm outside the excluded boxes

grid.newpage()
flowchart(
  rando = boxGrob(&quot;Randomised\nN = 197&quot;, box_gp = main_gp),
  groups = list(
    boxGrob(&quot;96 assigned to intervention\n95 received treatment&quot;,
            box_gp = arm_gp),
    boxGrob(&quot;101 assigned to control\n93 received treatment&quot;,
            box_gp = arm_gp)
  ),
  ex1 = list(
    boxGrob(&quot;8 died\n1 withdrew consent&quot;, just = &quot;left&quot;, box_gp = ex_gp),
    boxGrob(&quot;18 died\n1 withdrew consent&quot;, just = &quot;left&quot;, box_gp = ex_gp)
  ),
  groups1 = list(
    boxGrob(&quot;87 completed day 30\nfollow-up&quot;, box_gp = arm_gp),
    boxGrob(&quot;79 completed day 30\nfollow-up&quot;, box_gp = arm_gp)
  ),
  ex2 = list(
    boxGrob(&quot;8 died&quot;, just = &quot;left&quot;, box_gp = ex_gp),
    boxGrob(&quot;9 died\n1 withdrew consent\n2 lost to follow-up&quot;,
            just = &quot;left&quot;, box_gp = ex_gp)
  ),
  groups2 = list(
    boxGrob(&quot;79 completed day 180\nfollow-up&quot;, box_gp = arm_gp),
    boxGrob(&quot;68 completed day 180\nfollow-up&quot;, box_gp = arm_gp)
  ),
  analysis = list(
    boxGrob(&quot;95 included in primary\noutcome analysis&quot;, box_gp = arm_gp),
    boxGrob(&quot;95 included in primary\noutcome analysis&quot;, box_gp = arm_gp)
  )
) |&gt;
  spread(axis = &quot;y&quot;, margin = unit(0.02, &quot;npc&quot;)) |&gt;
  equalizeWidths(subelement = stringr::regex(&quot;^groups|analysis&quot;), width = box_width) |&gt;
  equalizeHeights(subelement = stringr::regex(&quot;^groups|analysis&quot;)) |&gt;
  equalizeWidths(subelement = stringr::regex(&quot;^ex&quot;), width = ex_width) |&gt;
  spread(subelement = stringr::regex(&quot;^groups|analysis&quot;), axis = &quot;x&quot;,
         from = arm_from, to = arm_to, type = &quot;center&quot;) |&gt;
  move(subelement = &quot;rando&quot;,
       x = position(&quot;groups&quot;, position = &quot;center&quot;, type = &quot;x&quot;)) |&gt;
  move(subelement = list(c(&quot;ex1&quot;, 1), c(&quot;ex2&quot;, 1)),
       x = ex_page_margin, just = &quot;left&quot;) |&gt;
  move(subelement = list(c(&quot;ex1&quot;, 2), c(&quot;ex2&quot;, 2)),
       x = 1 - ex_page_margin, just = &quot;right&quot;) |&gt;
  connect(&quot;rando&quot;, &quot;groups&quot;, type = &quot;N&quot;, lty_gp = con_gp, arrow_size = 3, smooth = TRUE) |&gt;
  connect(c(&quot;groups$1&quot;, &quot;groups1$1&quot;), c(&quot;ex1$1&quot;, &quot;ex2$1&quot;),
          type = &quot;side&quot;, lty_gp = side_gp, arrow_size = 3,
          side = &quot;left&quot;, end_side = &quot;right&quot;,
          side_route = &quot;outside&quot;, side_offset = side_offset) |&gt;
  connect(c(&quot;groups$2&quot;, &quot;groups1$2&quot;), c(&quot;ex1$2&quot;, &quot;ex2$2&quot;),
          type = &quot;side&quot;, lty_gp = side_gp, arrow_size = 3,
          side = &quot;right&quot;, end_side = &quot;left&quot;,
          side_route = &quot;outside&quot;, side_offset = side_offset) |&gt;
  connect(&quot;groups&quot;, &quot;groups1&quot;, type = &quot;vertical&quot;, lty_gp = con_gp, arrow_size = 3) |&gt;
  connect(&quot;groups1&quot;, &quot;groups2&quot;, type = &quot;vertical&quot;, lty_gp = con_gp, arrow_size = 3) |&gt;
  connect(&quot;groups2&quot;, &quot;analysis&quot;, type = &quot;vertical&quot;, lty_gp = con_gp, arrow_size = 3) |&gt;
  connect(list(&quot;ex1$1&quot;, &quot;ex2$1&quot;), &quot;analysis$1&quot;, type = &quot;side&quot;,
          lty_gp = dotted_gp, arrow_size = 3,
          side = &quot;left&quot;, end_side = &quot;left&quot;,
          side_route = &quot;outside&quot;,
          side_offset = fan_in_offset) |&gt;
  connect(list(&quot;ex1$2&quot;, &quot;ex2$2&quot;), &quot;analysis$2&quot;, type = &quot;side&quot;,
          lty_gp = dotted_gp, arrow_size = 3,
          side = &quot;right&quot;, end_side = &quot;right&quot;,
          side_route = &quot;outside&quot;,
          side_offset = fan_in_offset) |&gt;
  print()
</pre>
<h2 id="why-this-belongs-in-gmisc">Why this belongs in Gmisc</h2>
<p>Gmisc has always collected the small tools I found myself needing<br />
around medical statistics: descriptive tables, transition plots, and<br />
grid-based figures. Flowcharts fit that same pattern. They are not a<br />
statistical model, but they are part of how research is<br />
communicated.</p>
<p>The new flowchart work in 3.4.0 is therefore aimed at the practical<br />
problems:</p>
<ul>
<li>making CONSORT-like diagrams less painful to draw</li>
<li>keeping grouped stages aligned and readable</li>
<li>making arrows behave predictably</li>
<li>supporting side paths, return paths, and repeated box patterns</li>
<li>producing figures that can be regenerated when the study<br />
changes</li>
</ul>
<p>The vignette contains the full API and examples:</p>
<pre>
vignette(&quot;Grid-based_flowcharts&quot;, package = &quot;Gmisc&quot;)
</pre>
<p>The blog figures in this post are intentionally close to things<br />
researchers already have in their workflow: trial enrollment, registry<br />
construction, data validation, and follow-up accounting. My hope is that<br />
they make the flowchart tools feel less like a drawing utility and more<br />
like a small extension of the analysis itself.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://gforge.se/2026/06/flowcharts-that-belong-in-the-analysis-pipeline/"> R – G-Forge</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/flowcharts-that-belong-in-the-analysis-pipeline/">Flowcharts that belong in the analysis pipeline</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402192</post-id>	</item>
		<item>
		<title>EFA vs CFA: Key Differences Between Exploratory &#038; Confirmatory Factor Analysis (R)</title>
		<link>https://www.r-bloggers.com/2026/06/efa-vs-cfa-key-differences-between-exploratory-confirmatory-factor-analysis-r/</link>
		
		<dc:creator><![CDATA[Unknown]]></dc:creator>
		<pubDate>Tue, 23 Jun 2026 08:50:22 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://www.r-bloggers.com/?guid=284a072c3a8208b0d7d6047ae5220cfa</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
EFA explores unknown factor structures while CFA tests a predefined model. Compare both methods side by side with R code using psych and lavaan — know which to use in your dissertation.<br />
EFAEFA explores unknown factor structures while CFA tests a predefined model. Compare both methods side by side with R ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/efa-vs-cfa-key-differences-between-exploratory-confirmatory-factor-analysis-r/">EFA vs CFA: Key Differences Between Exploratory & Confirmatory Factor Analysis (R)</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.rstudiodatalab.com/2023/08/Confirmatory-Exploratory-Factor-Analysis.html"> RStudioDataLab</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<!--Explicit Post Snippet-->
<div aria-hidden="true" hidden="">EFA explores unknown factor structures while CFA tests a predefined model. Compare both methods side by side with R code using psych and lavaan — know which to use in your dissertation.</div>
<p style="text-align: justify;"><span class="dropCap">EFA</span>EFA explores unknown factor structures while CFA tests a predefined model. Compare both methods side by side with R code using psych and lavaan — know which to use in your dissertation.</p>
<!--Explicit Post Thumbnail-->
  <div class="separator"><img alt="EFA vs CFA: Key Differences Between Exploratory &#038; Confirmatory Factor Analysis (R)" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMVZ7BkU8eyoyYY9ghMQuuxGvCSVLPfxx-fA58rqtOaTk0fWoAW-EM8cqpDakLrcOe7BbrjORhH9Pzu8YziN1UR-zoYCf4BRN3nAUkJ8t1-kdhrmB90l9lhe-ZgMDrZcqySBuTVMAnTM9tKgkSISqeUEPQLjg4VYNa5ulG__WiUeQnOankaBtsFsGiP6M/s16000/Confirmatory%20Factor%20Analysis%20vs%20Exploratory%20Factor%20Analysis%20What's%20the%20Difference%20(1).webp" title="EFA vs CFA: Key Differences Between Exploratory &#038; Confirmatory Factor Analysis (R)" /></div>

<!--============================================================
     KEY POINTS
============================================================-->
<h2 style="text-align: justify;">Key Points</h2>
<ol style="text-align: left;">
  <li>EFA and CFA are both factor analysis methods, but they serve opposite purposes: EFA <em>discovers</em> factor structure; CFA <em>tests</em> a pre-specified structure.</li>
  <li>The core difference lies in factor loadings — EFA lets all items load freely on all factors; CFA constrains items to load only on their pre-assigned factor.</li>
  <li>In R, EFA uses the <strong>psych</strong> package (<code>fa()</code>); CFA uses the <strong>lavaan</strong> package (<code>cfa()</code>).</li>
  <li>CFA requires goodness-of-fit evaluation: CFI, TLI, RMSEA, and SRMR — all with established thresholds.</li>
  <li>Many dissertations use both in sequence — EFA on a pilot sample, CFA on an independent main sample.</li></ol>

<!--============================================================
     QUICK COMPARISON TABLE — this is the most critical addition
============================================================-->
<h2 style="text-align: left;">EFA vs CFA: Quick Comparison</h2>

<div class="table sticky bordered stripped hovered">
  <table>
    <thead>
      <tr>
        <th>Criterion</th>
        <th>EFA (Exploratory)</th>
        <th>CFA (Confirmatory)</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td><strong>Purpose</strong></td>
        <td>Discover unknown factor structure</td>
        <td>Test a pre-specified factor structure</td>
      </tr>
      <tr>
        <td><strong>Theory required?</strong></td>
        <td>No — data-driven</td>
        <td>Yes — theory-driven</td>
      </tr>
      <tr>
        <td><strong>Number of factors</strong></td>
        <td>Determined from data (parallel analysis)</td>
        <td>Specified by researcher in advance</td>
      </tr>
      <tr>
        <td><strong>Factor loadings</strong></td>
        <td>All items load freely on all factors</td>
        <td>Items constrained to pre-assigned factors</td>
      </tr>
      <tr>
        <td><strong>Factor rotation</strong></td>
        <td>Required (oblimin / varimax)</td>
        <td>Not applicable</td>
      </tr>
      <tr>
        <td><strong>Model fit indices</strong></td>
        <td>Not evaluated</td>
        <td>CFI, TLI, RMSEA, SRMR, χ²</td>
      </tr>
      <tr>
        <td><strong>R package</strong></td>
        <td>psych — <code>fa()</code></td>
        <td>lavaan — <code>cfa()</code></td>
      </tr>
      <tr>
        <td><strong>Research stage</strong></td>
        <td>Early / scale development</td>
        <td>Later / scale validation, SEM</td>
      </tr>
      <tr>
        <td><strong>When to use in dissertation</strong></td>
        <td>New or adapted questionnaire, weak prior theory</td>
        <td>Established scale, strong prior theory, SEM prep</td>
      </tr>
    </tbody>
  </table>
</div>

<!--Table of Contents-->
<details class="sp toc" open="">
  <summary data-hide="Hide all" data-show="Show all">Table of Contents</summary>
  <div class="aToc"></div>
</details>

<!--============================================================
     H1
============================================================-->
<h1 style="text-align: left;">EFA vs CFA: What Is the Difference Between Exploratory and Confirmatory Factor Analysis?</h1>

<p style="text-align: justify;"><span class="dropCap">E</span>FA and CFA are both forms of <a href="https://www.rstudiodatalab.com/2023/09/How-I-Perform-Factor-Analysis-in-R.html" rel="nofollow" target="_blank">factor analysis</a> — statistical methods that model the relationships between observed variables (e.g., survey items) and unobserved <strong>latent variables</strong> called factors. They are not competing methods; they serve different phases of measurement research. </p>
<blockquote class="s2"><p><b>EFA (Exploratory Factor Analysis)</b> lets the data reveal its own factor structure when you have no strong prior theory.</p></blockquote><blockquote class="s2"><p><b>CFA (Confirmatory Factor Analysis)</b> tests whether your observed data fit a structure you have already specified based on theory or prior EFA results.</p>Factor Anaysis</blockquote><p style="text-align: justify;"><br /></p>

<p class="note"><strong>Dissertation quick-pick rule:</strong><br />
Using a new or adapted questionnaire with no established factor structure? → Start with <strong>EFA</strong>.<br />
Replicating an established scale (Big Five, JSS, UTAUT) in a new sample? → Use <strong>CFA</strong> directly.<br />
Both in the same study? → EFA on the pilot sample; CFA on the main independent sample.</p>

<div class="separator" style="clear: both; text-align: center;">
  <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqBNcSsJjGl84T0rehkxzdxPNVH4p5KI9YRKmsd9lXAs2HLDv8Mn4uy9ioBZYBPv_nbQ_0e8Xey61jjBvYdO2aWdlDBGHzFmiEJaybKTrHBgxGfOC-IVzD69LCzAsW5v1WMHEyLzEEEgDbrg53LkqbGqL9D8wyrJykjfvHdQpy_xp0efGG1GTeC1L3F9A/s853/napkin-selection%20(1).png" style="margin-left: 1em; margin-right: 1em;" rel="nofollow" target="_blank">
    <img alt="EFA vs CFA diagram showing structural differences between exploratory and confirmatory factor analysis path models" border="0" data-original-height="264" data-original-width="450" src="https://i0.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqBNcSsJjGl84T0rehkxzdxPNVH4p5KI9YRKmsd9lXAs2HLDv8Mn4uy9ioBZYBPv_nbQ_0e8Xey61jjBvYdO2aWdlDBGHzFmiEJaybKTrHBgxGfOC-IVzD69LCzAsW5v1WMHEyLzEEEgDbrg53LkqbGqL9D8wyrJykjfvHdQpy_xp0efGG1GTeC1L3F9A/s16000/napkin-selection%20(1).png?resize=450%2C264&#038;ssl=1" title="EFA vs CFA: Path Model Structural Differences" data-recalc-dims="1" />
  </a>
</div>

<!--============================================================
     WHAT IS EFA?
============================================================-->
<h2 style="text-align: left;">What Is Exploratory Factor Analysis (EFA)?</h2>

<p>Exploratory Factor Analysis (EFA) is a data-driven method that identifies the number and nature of <a href="https://www.rstudiodatalab.com/2023/09/Factor-Analysis.html" rel="nofollow" target="_blank">latent factors</a> underlying a set of observed variables, without imposing any prior constraints on which items load on which factors. EFA is theory-generating — it reveals patterns in your data that can later be formalised into a testable model for CFA.</p>

<p><strong>Example:</strong> You design a 20-item questionnaire to measure academic motivation. You have no prior theory about how many sub-dimensions exist. EFA will cluster those 20 items into 3–5 factors (e.g., intrinsic motivation, extrinsic motivation, self-regulation) based purely on their intercorrelations — and the factor structure emerges from the data, not from your assumptions.</p>

<h3 style="text-align: left;">When to Use EFA</h3>

<div class="alert info"><strong>Use EFA when:</strong><br />
• You are developing a new scale or questionnaire from scratch.<br />
• You are adapting an existing scale to a new language, culture, or context.<br />
• The literature shows limited, mixed, or no prior evidence about the factor structure.<br />
• You want to reduce many variables into a smaller set of interpretable dimensions.<br />
• You are in the early, exploratory phase of your measurement validation workflow.
</div>

<h3 style="text-align: left;">EFA Assumptions and Sample Size Requirements</h3>

<p>Before running EFA, your data must meet several requirements. Items should be measured on interval or ordinal scales — <a href="https://www.rstudiodatalab.com/2023/07/Likert-Scale.html" rel="nofollow" target="_blank">Likert scales</a> are appropriate. Check data factorability using the <strong>Kaiser-Meyer-Olkin (KMO) test</strong> (value > 0.60 required; > 0.80 is good) and <strong>Bartlett&#8217;s test of sphericity</strong> (p < 0.05 required). For sample size, the recommended minimum is <strong>100 participants</strong>, but most methodologists advise at least <strong>5–10 cases per item</strong>. A 20-item scale needs at least 200 respondents for stable EFA results.</p>

<h3 style="text-align: left;">How to Run EFA in R Using the psych Package</h3>

<p>The <a href="https://www.rstudiodatalab.com/2023/09/exploratory-factor-analysis-efa-in-r.html" rel="nofollow" target="_blank">psych package</a> provides the most complete EFA workflow in R. The example below uses the built-in <code>bfi</code> dataset (25 Big Five personality items from the psych package itself):</p>

<pre>
# Step 1: Install and load the psych package
install.packages(&quot;psych&quot;)
library(psych)

# Step 2: Load data — bfi = Big Five Inventory (25 personality items)
data(bfi)
bfi_items &lt;- bfi[, 1:25]          # Select only the 25 personality items

# Step 3: Test factorability before running EFA
KMO(bfi_items)                     # KMO &gt; 0.60 required
cortest.bartlett(bfi_items, n = nrow(bfi_items))  # p &lt; 0.05 required

# Step 4: Determine the number of factors via parallel analysis
fa.parallel(bfi_items, fm = &quot;ml&quot;, fa = &quot;fa&quot;)

# Step 5: Run EFA with 5 factors and oblimin (oblique) rotation
efa_model &lt;- fa(bfi_items,
                nfactors = 5,
                rotate    = &quot;oblimin&quot;,  # oblique — factors are allowed to correlate
                fm        = &quot;ml&quot;)       # maximum likelihood estimation

# Step 6: Inspect factor loadings (show only loadings &gt; 0.30)
print(efa_model$loadings, cutoff = 0.3)

# Step 7: View factor structure diagram
fa.diagram(efa_model)
</pre>

<p class="note wr"><strong>Rotation choice:</strong><br />
Use <code>rotate = &quot;oblimin&quot;</code> (oblique) as your default — psychological and social science constructs are almost always correlated. Only use <code>rotate = &quot;varimax&quot;</code> (orthogonal) if you have a strong theoretical reason to assume completely independent factors.</p>

<!--============================================================
     WHAT IS CFA?
============================================================-->
<h2 style="text-align: left;">What Is Confirmatory Factor Analysis (CFA)?</h2>

<p>Confirmatory Factor Analysis (CFA) is a theory-testing method. You specify a measurement model in advance — exactly how many factors exist, which items load on which factors, and whether factors are correlated — then evaluate how well your observed data fit that model using goodness-of-fit indices. CFA is part of the Structural Equation Modelling (SEM) framework and is the standard method for establishing <strong>construct validity</strong> in dissertation research.</p>

<p><strong>Example:</strong> Prior research and your literature review both support a 2-factor model of academic motivation: intrinsic and extrinsic motivation. You specify this 2-factor CFA model with 10 items assigned in advance, fit it to your data, and evaluate whether CFI > 0.95, RMSEA < 0.06, and SRMR < 0.08. If fit is acceptable, you have confirmed the structure and can proceed to SEM.</p>

<div class="separator" style="clear: both; text-align: center;">
  <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3V3j5XhioJCGPVHPvbaUxSX_iD5VgLluogNVByZ1k8iFOumHwDEyxTajYLeHzB2_bS8yJH4HSaLRJoIU3hiC9SkoNjzS7noNMtPtP_eAnOcIW8RK2gMy2sHhNA7wUtIQikDTg1OVQTZ1zTvleBtcfBv01Gsb0_Bd8ACFNNT-v-z-Drl_KtaQEVrrBj08/s1200/Confirmatory%20Factor%20Analysis%20vs%20Exploratory%20Factor%20Analysis%20What's%20the%20Difference%20(3).webp" style="margin-left: 1em; margin-right: 1em;" rel="nofollow" target="_blank">
    <img loading="lazy" alt="Confirmatory Factor Analysis CFA model — factor loadings, latent variables, measurement model structure" border="0" data-original-height="630" data-original-width="450" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3V3j5XhioJCGPVHPvbaUxSX_iD5VgLluogNVByZ1k8iFOumHwDEyxTajYLeHzB2_bS8yJH4HSaLRJoIU3hiC9SkoNjzS7noNMtPtP_eAnOcIW8RK2gMy2sHhNA7wUtIQikDTg1OVQTZ1zTvleBtcfBv01Gsb0_Bd8ACFNNT-v-z-Drl_KtaQEVrrBj08/w640-h336/Confirmatory%20Factor%20Analysis%20vs%20Exploratory%20Factor%20Analysis%20What's%20the%20Difference%20(3).webp" title="CFA Measurement Model: Latent Variables and Factor Loadings" width="450" />
  </a>
</div>

<h3 style="text-align: left;">When to Use CFA</h3>

<div class="alert info"><strong>Use CFA when:</strong><br />
• You are validating or replicating an established measurement scale in a new sample.<br />
• You have strong theoretical or prior empirical support for a specific factor structure.<br />
• You need to assess construct validity (convergent and discriminant validity).<br />
• You are preparing data for Structural Equation Modelling (SEM) — CFA is a mandatory prerequisite.<br />
• You want to compare competing theoretical models (e.g., one-factor vs two-factor structure).<br />
• You are confirming the factor structure found in an earlier EFA.
</div>

<h3 style="text-align: left;">How to Run CFA in R Using the lavaan Package</h3>

<p>The <a class="extL" href="https://lavaan.ugent.be/" rel="nofollow" target="_blank">lavaan package</a> is the standard CFA and SEM tool in R. Here is a full working example using two factors from the <code>bfi</code> dataset:</p>

<pre>
# Step 1: Install and load lavaan
install.packages(&quot;lavaan&quot;)
library(lavaan)

# Step 2: Define your measurement model using lavaan syntax
# Each line: factor_name =~ item1 + item2 + item3 ...
cfa_model &lt;- '
  agreeableness     =~ A1 + A2 + A3 + A4 + A5
  conscientiousness =~ C1 + C2 + C3 + C4 + C5
'

# Step 3: Fit the model to your data
fit &lt;- cfa(cfa_model,
            data   = bfi,
            std.lv = TRUE)   # standardise latent variables

# Step 4: Full model summary with standardised loadings and fit indices
summary(fit, fit.measures = TRUE, standardized = TRUE)

# Step 5: Extract specific fit indices for reporting
fitMeasures(fit, c(&quot;cfi&quot;, &quot;tli&quot;, &quot;rmsea&quot;, &quot;rmsea.ci.lower&quot;,
                   &quot;rmsea.ci.upper&quot;, &quot;srmr&quot;, &quot;chisq&quot;, &quot;df&quot;, &quot;pvalue&quot;))

# Step 6: Inspect modification indices if fit is poor
modindices(fit, sort. = TRUE, maximum.number = 10)
</pre>

<!--============================================================
     FULL COMPARISON TABLE
============================================================-->
<h2 style="text-align: left;">EFA vs CFA: Full Head-to-Head Comparison</h2>

<div class="separator" style="clear: both; text-align: center;">
  <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjerHwCeSts8n0yZNqsP526LF4Mw0Jrp-HxtPiQ0aBeM-nt3CZ4rYROA0lKK0IiICI3n4ihU65f3vlaJmqHcVaiV9NEamaDD2Z_0mGR9U0Pp9TeXGBkLp5ibmp0MqHFLItzqOo_51cLCeLMuNgLRGICkaKZU3hz9m9aJajn8Awa2pYCbCORiEGNKiLTAkE/s1200/Confirmatory%20Factor%20Analysis%20vs%20Exploratory%20Factor%20Analysis%20What's%20the%20Difference.webp" style="margin-left: 1em; margin-right: 1em;" rel="nofollow" target="_blank">
    <img loading="lazy" alt="EFA vs CFA full comparison of exploratory and confirmatory factor analysis methods in research" border="0" data-original-height="630" data-original-width="450" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjerHwCeSts8n0yZNqsP526LF4Mw0Jrp-HxtPiQ0aBeM-nt3CZ4rYROA0lKK0IiICI3n4ihU65f3vlaJmqHcVaiV9NEamaDD2Z_0mGR9U0Pp9TeXGBkLp5ibmp0MqHFLItzqOo_51cLCeLMuNgLRGICkaKZU3hz9m9aJajn8Awa2pYCbCORiEGNKiLTAkE/w640-h336/Confirmatory%20Factor%20Analysis%20vs%20Exploratory%20Factor%20Analysis%20What's%20the%20Difference.webp" title="EFA vs CFA: Full Comparison" width="450" />
  </a>
</div>

<div class="table sticky bordered stripped hovered">
  <table>
    <thead>
      <tr>
        <th>Feature</th>
        <th>EFA (Exploratory Factor Analysis)</th>
        <th>CFA (Confirmatory Factor Analysis)</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Goal</td>
        <td>Discover and generate factor structure</td>
        <td>Test and confirm pre-specified structure</td>
      </tr>
      <tr>
        <td>Type</td>
        <td>Theory-generating</td>
        <td>Theory-testing</td>
      </tr>
      <tr>
        <td>Prior theory required</td>
        <td>No</td>
        <td>Yes</td>
      </tr>
      <tr>
        <td>Number of factors</td>
        <td>Data-driven (parallel analysis, scree plot)</td>
        <td>Specified by researcher before analysis</td>
      </tr>
      <tr>
        <td>Factor loadings</td>
        <td>All items load freely on all factors</td>
        <td>Pre-specified; cross-loadings fixed to zero</td>
      </tr>
      <tr>
        <td>Factor correlations</td>
        <td>Depends on rotation method chosen</td>
        <td>Specified by researcher (correlated or orthogonal)</td>
      </tr>
      <tr>
        <td>Estimation method</td>
        <td>ML, PAF (principal axis factoring), ULS</td>
        <td>ML, WLS, WLSMV (for ordinal/Likert data)</td>
      </tr>
      <tr>
        <td>Rotation</td>
        <td>Required — oblimin (oblique) or varimax (orthogonal)</td>
        <td>Not applicable</td>
      </tr>
      <tr>
        <td>Model fit evaluation</td>
        <td>Not applicable</td>
        <td>CFI, TLI, RMSEA, SRMR, χ²/df</td>
      </tr>
      <tr>
        <td>R package</td>
        <td>psych — <code>fa()</code></td>
        <td>lavaan — <code>cfa()</code></td>
      </tr>
      <tr>
        <td>SPSS equivalent</td>
        <td>Analyze → Dimension Reduction → Factor</td>
        <td>AMOS (or lavaan in R)</td>
      </tr>
      <tr>
        <td>Research application</td>
        <td>Scale development, instrument design, pilot studies</td>
        <td>Scale validation, SEM, multi-group analysis</td>
      </tr>
      <tr>
        <td>Typical research stage</td>
        <td>Early-stage / exploratory</td>
        <td>Later-stage / confirmatory / validation</td>
      </tr>
    </tbody>
  </table>
</div>

<!--============================================================
     MODEL FIT INDICES — NEW SECTION
============================================================-->
<h2 style="text-align: left;">CFA Model Fit Indices: RMSEA, CFI, TLI, and SRMR Explained</h2>

<p>When you run CFA, evaluating model fit is not optional — it is the core output. A CFA result without fit indices is unpublishable. The table below shows every index you need to report, what it measures, and the widely accepted thresholds. Report at least three; never rely on χ² alone (it is highly sensitive to sample size).</p>

<div class="table sticky bordered stripped hovered">
  <table>
    <thead>
      <tr>
        <th>Fit Index</th>
        <th>What it measures</th>
        <th>Acceptable threshold</th>
        <th>Good fit</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td><strong>CFI</strong> — Comparative Fit Index</td>
        <td>How much better your model fits than a null (no-factor) model</td>
        <td>> 0.90</td>
        <td>> 0.95</td>
      </tr>
      <tr>
        <td><strong>TLI</strong> — Tucker-Lewis Index</td>
        <td>Like CFI but penalises model complexity</td>
        <td>> 0.90</td>
        <td>> 0.95</td>
      </tr>
      <tr>
        <td><strong>RMSEA</strong> — Root Mean Square Error of Approximation</td>
        <td>Average error per degree of freedom — lower is better</td>
        <td>< 0.08</td>
        <td>< 0.05</td>
      </tr>
      <tr>
        <td><strong>SRMR</strong> — Standardised Root Mean Square Residual</td>
        <td>Average difference between observed and predicted correlations</td>
        <td>< 0.08</td>
        <td>< 0.05</td>
      </tr>
      <tr>
        <td><strong>χ² / df ratio</strong></td>
        <td>Overall model misfit (avoid as sole criterion — n-sensitive)</td>
        <td>< 3.0</td>
        <td>< 2.0</td>
      </tr>
    </tbody>
  </table>
</div>

<p class="note"><strong>APA reporting template for dissertations:</strong><br />
"The two-factor CFA model demonstrated acceptable fit: χ²(34) = 67.2, p < .001, CFI = .96, TLI = .95, RMSEA = .047 [90% CI: .029–.064], SRMR = .051. All standardised factor loadings were statistically significant and exceeded .50 (range: .53–.78)."</p>

<!--============================================================
     USING BOTH EFA AND CFA — NEW SECTION
============================================================-->
<h2 style="text-align: left;">Can You Use Both EFA and CFA in the Same Dissertation?</h2>

<p>Yes — and in many quantitative dissertations, using both is the <em>most rigorous</em> approach. The critical rule: you must use <strong>independent datasets</strong>. Using the same data for EFA and then CFA is a methodological error that peer reviewers and examiners will flag, because a CFA model derived from EFA results will always fit the same data well — that is circularity, not validation.</p>

<div class="alert warning"><strong>Critical error to avoid:</strong> Never run EFA and CFA on the same dataset. The CFA model will fit the data it was built from — this proves nothing. Always use separate, independent samples for each phase.
</div>

<p>The correct sequential approach, step by step:</p>

<ol class="steps">
  <li><strong>Collect two independent datasets.</strong> Option A: run a pilot study (n ≥ 100–150) for EFA, then collect a main sample (n ≥ 200–300) for CFA. Option B: collect one large dataset and split it randomly 50/50.</li>
  <li><strong>Run EFA on Sample 1</strong> using the <a href="https://www.rstudiodatalab.com/2023/09/exploratory-factor-analysis-efa-in-r.html" rel="nofollow" target="_blank">psych package in R</a>. Report KMO, Bartlett's test, parallel analysis output, factor loadings, communalities, and percentage variance explained.</li>
  <li><strong>Specify the CFA model</strong> based on the EFA factor structure. Assign each item to the factor it loaded most strongly on. Drop items with cross-loadings above 0.30 on two or more factors.</li>
  <li><strong>Run CFA on Sample 2</strong> using lavaan. Fit the model, evaluate fit indices, and inspect modification indices if fit is inadequate.</li>
  <li><strong>Report both analyses</strong> in your methodology chapter, clearly stating which sample was used for which analysis and why the sequential approach was chosen.</li>
</ol>

<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlyMQ8QB_BPaugLZ3Npl9OYdK0a-7KsguvAZTLd1QXIj4JLj9GNIG44rXTN2jAksrl_-ALEK_gvBFcUZYln33dg-YdtO15yzzYcntYntcPBNfCdxVvUkfV-HrTbYomQqYwt4Mv3YjGbuOxli3cOMCKxfwyxsRaES_6dHxQiQi78mwU9Ice24GseNqyyrY/s744/napkin-selection%20(2).png" style="margin-left: 1em; margin-right: 1em; text-align: center;" rel="nofollow" target="_blank">
  <img alt="EFA and CFA sequential research workflow — pilot sample for EFA then independent main sample for CFA dissertation process" border="0" data-original-height="576" data-original-width="450" src="https://i1.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlyMQ8QB_BPaugLZ3Npl9OYdK0a-7KsguvAZTLd1QXIj4JLj9GNIG44rXTN2jAksrl_-ALEK_gvBFcUZYln33dg-YdtO15yzzYcntYntcPBNfCdxVvUkfV-HrTbYomQqYwt4Mv3YjGbuOxli3cOMCKxfwyxsRaES_6dHxQiQi78mwU9Ice24GseNqyyrY/s16000/napkin-selection%20(2).png?resize=450%2C576&#038;ssl=1" title="EFA + CFA Sequential Research Workflow for Dissertations" data-recalc-dims="1" />
</a>

<!--============================================================
     DECISION GUIDE
============================================================-->
<h2 style="text-align: left;">How to Choose Between EFA and CFA: Decision Rules</h2>

<p>The choice between EFA and CFA depends on your research question, the state of the literature, and the purpose of your factor analysis. These rules cover the most common dissertation scenarios:</p>

<ul style="text-align: left;">
  <li>No prior theory about factor structure → <strong>EFA</strong></li>
  <li>Established, well-cited factor structure from prior studies → <strong>CFA</strong></li>
  <li>Adapting a scale to a new language, culture, or population → <strong>EFA first, then CFA</strong></li>
  <li>Building toward Structural Equation Modelling → <strong>CFA mandatory</strong></li>
  <li>Developing a new psychometric scale from scratch → <strong>EFA first, CFA to validate</strong></li>
  <li>Comparing two competing theoretical models → <strong>CFA</strong> (use model comparison with <a href="https://www.rstudiodatalab.com/2025/07/likelihood-ratio-test-r.html" rel="nofollow" target="_blank">likelihood ratio test</a>)</li>
  <li>Mixed or contradictory literature on factor structure → <strong>EFA</strong></li>
  <li>Testing construct validity (convergent + discriminant) → <strong>CFA</strong></li>
</ul>

<!--============================================================
     EXAMPLES
============================================================-->
<h2 style="text-align: left;">Examples of EFA and CFA in Research</h2>

<h3 style="text-align: left;">Example 1: EFA of Personality Traits — The Big Five</h3>

<p>The most influential application of EFA is the development of the <strong>Big Five personality model</strong>. Researchers applied EFA to hundreds of personality adjectives across multiple independent samples. Without any prior constraint on which traits should cluster together, EFA consistently revealed five factors: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. This is EFA at its best — no prior theory constrained the analysis, and the five-factor structure replicated across cultures and languages.</p>

<p>To replicate this in R, run EFA with 5 factors and oblimin rotation on the <code>bfi</code> dataset in the psych package (code shown above). Each of the five resulting factors maps cleanly onto one of the Big Five dimensions, with factor loadings above 0.40 for the primary items.</p>

<h3 style="text-align: left;">Example 2: CFA of Job Satisfaction — The JSS</h3>

<p>Spector's (1985) Job Satisfaction Survey (JSS) proposes a 9-factor model covering pay, promotion, supervision, fringe benefits, contingent rewards, operating procedures, co-workers, nature of work, and communication. A researcher validating the JSS in a healthcare sample would use CFA: specify all 36 items loading on their designated factors, fit the model with lavaan, and evaluate whether CFI > 0.95, RMSEA < 0.06, and SRMR < 0.08.</p>

<p>If fit is poor (e.g., RMSEA > 0.08), consult modification indices and consider whether any theoretically justifiable correlated residuals between items within the same facet would improve fit. Only modify parameters where there is both statistical and substantive justification. Need help interpreting your CFA output or writing up your <a href="https://www.rstudiodatalab.com/2023/08/p-value-less-than-0.05.html" rel="nofollow" target="_blank">p-values and fit indices</a>? <a class="extL" href="https://wa.me/923106367532" rel="nofollow" target="_blank">Message Dr. Zubair on WhatsApp.</a></p>

<!--============================================================
     COMMON MISTAKES — NEW SECTION
============================================================-->
<h2 style="text-align: left;">Common EFA and CFA Mistakes to Avoid</h2>

<ul style="text-align: left;">
  <li><strong>Using EFA and CFA on the same sample</strong> — the most common dissertation error. Always use independent samples.</li>
  <li><strong>Choosing the number of EFA factors by eigenvalue > 1 rule alone</strong> — this systematically over-extracts factors. Use parallel analysis (<code>fa.parallel()</code>) instead.</li>
  <li><strong>Using orthogonal rotation (varimax) by default</strong> — most psychological and social science constructs are correlated. Use oblimin rotation unless theory dictates independence.</li>
  <li><strong>Reporting only χ² for CFA</strong> — χ² is significant with any n > 200. Always report CFI, TLI, RMSEA, and SRMR alongside it.</li>
  <li><strong>Keeping cross-loading items in the CFA model</strong> — items that loaded > 0.30 on two or more factors in EFA should be dropped before CFA specification.</li>
  <li><strong>Fewer than three items per factor</strong> — two-item factors are under-identified in CFA. Each factor needs at least 3 indicators.</li>
  <li><strong>Confusing EFA with PCA</strong> — <a href="https://www.rstudiodatalab.com/2023/09/principal-component-analysis-in-r-i-pca.html" rel="nofollow" target="_blank">Principal Component Analysis (PCA)</a> is a data reduction method, not a factor analysis technique. See our guide on <a href="https://www.rstudiodatalab.com/2023/09/factor-analysis-and-principal-component-analysis.html" rel="nofollow" target="_blank">Factor Analysis vs PCA</a> for the full distinction.</li>
</ul>

<!--============================================================
     DISSERTATION REPORTING GUIDE — NEW SECTION
============================================================-->
<h2 style="text-align: left;">EFA and CFA in Thesis and Dissertation: Reporting Requirements</h2>

<p>Both analyses appear in the methodology chapter under "Measurement Validation" or "Scale Development." Here is exactly what your committee and journal reviewers expect:</p>

<ul style="text-align: left;">
  <li><strong>For EFA:</strong> Report KMO value, Bartlett's test (χ², df, p), number of factors extracted, method for determining factor number (parallel analysis recommended), extraction method (ML recommended), rotation method, eigenvalues for retained factors, percentage variance explained by each factor and total, and a complete factor loading matrix with items bolded if > 0.30.</li>
  <li><strong>For CFA:</strong> Report the measurement model diagram (path diagram), sample size, estimation method (ML for continuous/normal data; WLSMV for ordinal/Likert), and fit indices: χ²(df), p-value, CFI, TLI, RMSEA [90% CI], SRMR. Report all standardised factor loadings and their significance. State whether modification indices were consulted and, if the model was modified, provide both theoretical and statistical justification.</li>
  <li><strong>For both in the same study:</strong> Clearly label Sample 1 (EFA) and Sample 2 (CFA) in your methods section. Justify the sequential strategy. Provide descriptive statistics for both samples.</li>
</ul>

<p>For related analyses your dissertation may also require, see our guides on <a href="https://www.rstudiodatalab.com/2023/09/exploratory-factor-analysis-efa-in-r.html" rel="nofollow" target="_blank">EFA in R with psych</a>, <a href="https://www.rstudiodatalab.com/2023/09/principal-component-analysis-in-r-i-pca.html" rel="nofollow" target="_blank">PCA in R</a>, <a href="https://www.rstudiodatalab.com/2023/09/factor-analysis-and-principal-component-analysis.html" rel="nofollow" target="_blank">Factor Analysis vs PCA</a>, and <a href="https://www.rstudiodatalab.com/2024/10/shapiro-wilk-normality-test-shapirotest.html" rel="nofollow" target="_blank">normality testing with Shapiro-Wilk</a> before running your analyses.</p>

<!--============================================================
     CONCLUSION
============================================================-->
<h2 style="text-align: left;">Conclusion</h2>

<p>The difference between EFA and CFA comes down to one question: do you know the factor structure, or are you trying to find it? EFA discovers structure from data when theory is absent or weak. CFA confirms a structure you have already specified when theory is strong or prior EFA results exist. In R, the psych package handles EFA and the lavaan package handles CFA — both are free, well-documented, and the current standard in academic research.</p>

<p>For PhD and Master's dissertation researchers: the most defensible methodology for a new measurement instrument is EFA on a pilot sample followed by CFA on an independent main sample. This sequential approach demonstrates both exploratory rigour and confirmatory validity to examiners and reviewers.</p>

<p>If you need expert help with your EFA or CFA analysis — including running the analysis in R or SPSS, interpreting fit indices, writing up results in APA format, or preparing your methodology chapter — <a class="extL" href="https://wa.me/923106367532" rel="nofollow" target="_blank">contact Dr. Zubair Goraya on WhatsApp</a> or book a session via the link below.</p>

<a class="button" href="https://wa.me/923106367532" rel="nofollow" target="_blank">Get Help With Your Factor Analysis → WhatsApp</a>
<br />
<a class="button ln" href="https://www.rstudiodatalab.com/2023/08/REPLACE_WITH_LIVE_BOOKING_URL" rel="nofollow" target="_blank">Book a Consulting Session</a>

<!--============================================================
     FAQ — FAQPage microdata + accordion component
============================================================-->
<h2 style="text-align: left;">Frequently Asked Questions</h2>

<div class="showH"  >

  <details class="ac"   >
    <summary >What is the difference between EFA and CFA?</summary>
    <div   >
      <p >EFA (Exploratory Factor Analysis) is used when you have no prior theory about the factor structure of your data — it discovers the structure from the data itself. CFA (Confirmatory Factor Analysis) is used when you already have a hypothesised structure and want to test whether your data fit it. EFA is theory-generating; CFA is theory-testing. In R, EFA uses the psych package (fa() function) and CFA uses the lavaan package (cfa() function).</p>
    </div>
  </details>

  <details class="ac"   >
    <summary >What is EFA in research?</summary>
    <div   >
      <p >EFA (Exploratory Factor Analysis) is a statistical method used to identify the underlying latent factor structure of a set of observed variables without imposing any prior constraints. EFA determines how many factors exist in the data and which items cluster onto which factors. It is widely used in scale development, psychometrics, and any research context where the structure of a construct has not yet been established. In R, EFA is performed using the fa() function in the psych package.</p>
    </div>
  </details>

  <details class="ac"   >
    <summary >What is CFA (confirmatory factor analysis)?</summary>
    <div   >
      <p >Confirmatory Factor Analysis (CFA) is a theory-testing method where the researcher specifies in advance how many factors exist, which items load on which factors, and whether factors are correlated. CFA evaluates how well this pre-specified model fits the observed data using goodness-of-fit indices: CFI (> 0.95), TLI (> 0.95), RMSEA (< 0.06), and SRMR (< 0.08). In R, CFA is performed using the cfa() function in the lavaan package.</p>
    </div>
  </details>

  <details class="ac"   >
    <summary >When should I use EFA vs CFA in my dissertation?</summary>
    <div   >
      <p >Use EFA when you are developing a new measurement instrument, adapting an existing scale to a new population, or when the literature provides limited or contradictory evidence about factor structure. Use CFA when you are validating an established scale, testing a theoretically supported structure, or preparing data for Structural Equation Modelling (SEM). Many dissertations use both: EFA on a pilot sample to identify the structure, then CFA on an independent main sample to confirm it.</p>
    </div>
  </details>

  <details class="ac alt"   >
    <summary >Can I use both EFA and CFA in the same study?</summary>
    <div   >
      <p >Yes, but you must use independent datasets for each analysis. The standard approach is to collect a pilot sample (n ≥ 100) for EFA and a separate main sample (n ≥ 200) for CFA. Alternatively, randomly split one large dataset 50/50. Running EFA and CFA on the same data is a methodological error — the CFA will always fit well on data it was derived from, which is circularity, not validation.</p>
    </div>
  </details>

  <details class="ac alt"   >
    <summary >What R packages are used for EFA and CFA?</summary>
    <div   >
      <p >For EFA in R, the psych package is the standard tool — use the fa() function with nfactors, rotate ("oblimin" or "varimax"), and fm ("ml" for maximum likelihood) arguments. For CFA in R, the lavaan package is the industry standard — define your measurement model using lavaan syntax, fit it with cfa(), and evaluate fit using fitMeasures(). Install both via install.packages("psych") and install.packages("lavaan") from CRAN.</p>
    </div>
  </details>

  <details class="ac alt"   >
    <summary >What are acceptable model fit index values for CFA?</summary>
    <div   >
      <p >Widely accepted CFA fit thresholds are: CFI > 0.95 (acceptable: > 0.90), TLI > 0.95 (acceptable: > 0.90), RMSEA < 0.06 (acceptable: < 0.08), and SRMR < 0.06 (acceptable: < 0.08). Never use chi-square as the sole criterion — it is statistically significant with sample sizes above 200 even when fit is acceptable. Always report at least three fit indices in your dissertation, including RMSEA with its 90% confidence interval.</p>
    </div>
  </details>

</div>

<!--Related Posts-->
<details class="sp arp" open="">
  <summary data-hide="Hide all" data-show="Show all">Related Posts</summary>
  <div class="aRel"></div>
</details>

<!--============================================================
     SCHEMA MARKUP — ALL REQUIRED TYPES
============================================================-->

<!--1. Article Schema-->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "@id": "https://www.rstudiodatalab.com/2023/08/Confirmatory-Exploratory-Factor-Analysis.html#article",
  "mainEntityOfPage": "https://www.rstudiodatalab.com/2023/08/Confirmatory-Exploratory-Factor-Analysis.html",
  "headline": "EFA vs CFA: Key Differences Between Exploratory & Confirmatory Factor Analysis (R)",
  "name": "EFA vs CFA: Key Differences Between Exploratory & Confirmatory Factor Analysis (R)",
  "url": "https://www.rstudiodatalab.com/2023/08/Confirmatory-Exploratory-Factor-Analysis.html",
  "description": "EFA explores unknown factor structures; CFA tests a predefined model. Full comparison with R examples using psych and lavaan packages — includes when to use each for thesis research.",
  "image": {
    "@type": "ImageObject",
    "url": "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMVZ7BkU8eyoyYY9ghMQuuxGvCSVLPfxx-fA58rqtOaTk0fWoAW-EM8cqpDakLrcOe7BbrjORhH9Pzu8YziN1UR-zoYCf4BRN3nAUkJ8t1-kdhrmB90l9lhe-ZgMDrZcqySBuTVMAnTM9tKgkSISqeUEPQLjg4VYNa5ulG__WiUeQnOankaBtsFsGiP6M/s1200/Confirmatory%20Factor%20Analysis%20vs%20Exploratory%20Factor%20Analysis%20What's%20the%20Difference%20(1).webp",
    "width": 1200,
    "height": 630
  },
  "datePublished": "2023-08-11",
  "dateModified": "2025-06-22",
  "author": {
    "@type": "Person",
    "name": "Zubair Goraya",
    "url": "https://zubairgoraya.rstudiodatalab.com/"
  },
  "publisher": {
    "@type": "Organization",
    "name": "RStudioDataLab",
    "description": "Unlock the secrets of data analysis with our comprehensive RStudio tutorials. From mastering the basics to tackling complex challenges, our blog provides the tools and knowledge you need to take your data analysis skills to the next level.",
    "logo": {
      "@type": "ImageObject",
      "url": "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhP_jQ9kbWVkjZ1T-5_osDo_JBuq2RAOB4_9Z726e3GPurZSUICYi5U_70kzDHQXZXzkgvhskpoXgTPeaolBDTZpz0qouYLOB8k5ge142uh5cIyJpVLYNvJ17V1wwNVxWKfX5LWq_WvU7nKpSTPvSGxgOQOSbJuXZEo1ylOsD7WJcIuTtx41Ofwo4cjwo0/s500/RStudioDataLab%20500%20x500.png",
      "width": 500,
      "height": 500
    }
  }
}
</script>

<!--2. FAQPage Schema-->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is the difference between EFA and CFA?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "EFA (Exploratory Factor Analysis) discovers factor structure from data without prior constraints. CFA (Confirmatory Factor Analysis) tests whether data fit a pre-specified theoretical structure. EFA is theory-generating; CFA is theory-testing. In R, EFA uses the psych package (fa()) and CFA uses lavaan (cfa())."
      }
    },
    {
      "@type": "Question",
      "name": "What is EFA in research?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "EFA (Exploratory Factor Analysis) identifies the underlying latent factor structure of observed variables without prior constraints. It determines how many factors exist and which items cluster on which factors. Used in scale development and early-stage research. In R, performed with fa() in the psych package."
      }
    },
    {
      "@type": "Question",
      "name": "What is CFA (confirmatory factor analysis)?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Confirmatory Factor Analysis (CFA) tests whether observed data fit a pre-specified measurement model. The researcher specifies in advance how many factors exist, which items load on which factors, and whether factors are correlated. Model fit is evaluated using CFI (>0.95), TLI (>0.95), RMSEA (<0.06), and SRMR (<0.08). In R, performed using the cfa() function in the lavaan package."
      }
    },
    {
      "@type": "Question",
      "name": "When should I use EFA vs CFA in my dissertation?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Use EFA when developing a new measurement instrument or when the literature provides limited evidence about factor structure. Use CFA when validating an established scale, testing a theory-supported structure, or preparing data for SEM. Many dissertations use both: EFA on a pilot sample, then CFA on an independent main sample."
      }
    },
    {
      "@type": "Question",
      "name": "Can I use both EFA and CFA in the same study?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Yes, but you must use independent datasets. Use a pilot sample (n ≥ 100) for EFA and a separate main sample (n ≥ 200) for CFA, or split a large dataset randomly 50/50. Running both on the same data is a methodological error — the CFA will always fit data it was derived from."
      }
    },
    {
      "@type": "Question",
      "name": "What R packages are used for EFA and CFA?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "For EFA in R, use the psych package — specifically the fa() function with nfactors, rotate, and fm arguments. For CFA in R, use the lavaan package with the cfa() function. Install both via install.packages('psych') and install.packages('lavaan') from CRAN."
      }
    },
    {
      "@type": "Question",
      "name": "What are acceptable model fit index values for CFA?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Acceptable CFA fit thresholds: CFI > 0.90 (good: > 0.95), TLI > 0.90 (good: > 0.95), RMSEA < 0.08 (good: < 0.05), SRMR < 0.08 (good: < 0.05). Never rely on chi-square alone. Report at least three fit indices including RMSEA with its 90% confidence interval."
      }
    }
  ]
}
</script>

<!--3. HowTo Schema — How to Run EFA and CFA in R-->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Run EFA and CFA in R: Step-by-Step Guide",
  "description": "Complete guide to performing Exploratory Factor Analysis (EFA) using the psych package and Confirmatory Factor Analysis (CFA) using the lavaan package in R.",
  "step": [
    {
      "@type": "HowToSection",
      "name": "Running EFA in R (psych package)",
      "itemListElement": [
        {
          "@type": "HowToStep",
          "position": "1",
          "name": "Install and load the psych package",
          "text": "Run install.packages('psych') then library(psych) in your R console."
        },
        {
          "@type": "HowToStep",
          "position": "2",
          "name": "Test data factorability",
          "text": "Use KMO() to check the Kaiser-Meyer-Olkin measure (must be > 0.60) and cortest.bartlett() for Bartlett's test of sphericity (p < 0.05 required)."
        },
        {
          "@type": "HowToStep",
          "position": "3",
          "name": "Determine number of factors",
          "text": "Run fa.parallel() for parallel analysis — the most accurate method for deciding how many factors to retain."
        },
        {
          "@type": "HowToStep",
          "position": "4",
          "name": "Run EFA with oblimin rotation",
          "text": "Use fa(data, nfactors = n, rotate = 'oblimin', fm = 'ml'). Oblimin (oblique) rotation allows factors to correlate, which is appropriate for most social science constructs."
        },
        {
          "@type": "HowToStep",
          "position": "5",
          "name": "Interpret factor loadings",
          "text": "Use print(efa_model$loadings, cutoff = 0.3) to view loadings above 0.30. Items loading > 0.30 on two or more factors should be dropped before CFA."
        }
      ]
    },
    {
      "@type": "HowToSection",
      "name": "Running CFA in R (lavaan package)",
      "itemListElement": [
        {
          "@type": "HowToStep",
          "position": "1",
          "name": "Install and load lavaan",
          "text": "Run install.packages('lavaan') then library(lavaan)."
        },
        {
          "@type": "HowToStep",
          "position": "2",
          "name": "Specify the measurement model",
          "text": "Define your model using lavaan syntax: factor_name =~ item1 + item2 + item3. Each line assigns items to a factor."
        },
        {
          "@type": "HowToStep",
          "position": "3",
          "name": "Fit the CFA model",
          "text": "Run fit <- cfa(model, data = your_data, std.lv = TRUE) to fit the CFA model to your data."
        },
        {
          "@type": "HowToStep",
          "position": "4",
          "name": "Evaluate model fit",
          "text": "Run summary(fit, fit.measures = TRUE, standardized = TRUE). Target: CFI > 0.95, TLI > 0.95, RMSEA < 0.06, SRMR < 0.08."
        },
        {
          "@type": "HowToStep",
          "position": "5",
          "name": "Improve fit if needed",
          "text": "Run modindices(fit, sort. = TRUE, maximum.number = 10) to identify which parameter modifications would most improve fit. Only implement theoretically justifiable changes."
        }
      ]
    }
  ]
}
</script>

<!--4. BreadcrumbList Schema-->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "Home",
      "item": "https://www.rstudiodatalab.com/"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "Factor Analysis",
      "item": "https://www.rstudiodatalab.com/2023/09/Factor-Analysis.html"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "EFA vs CFA: Key Differences",
      "item": "https://www.rstudiodatalab.com/2023/08/Confirmatory-Exploratory-Factor-Analysis.html"
    }
  ]
}
</script>

<!--5. Organization Schema-->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "RStudioDataLab",
  "url": "https://www.rstudiodatalab.com/",
  "logo": "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhP_jQ9kbWVkjZ1T-5_osDo_JBuq2RAOB4_9Z726e3GPurZSUICYi5U_70kzDHQXZXzkgvhskpoXgTPeaolBDTZpz0qouYLOB8k5ge142uh5cIyJpVLYNvJ17V1wwNVxWKfX5LWq_WvU7nKpSTPvSGxgOQOSbJuXZEo1ylOsD7WJcIuTtx41Ofwo4cjwo0/s500/RStudioDataLab%20500%20x500.png",
  "telephone": "+923106367532",
  "sameAs": [
    "https://www.facebook.com/RStudioDataLab",
    "https://www.instagram.com/rstudiodatalab/",
    "https://twitter.com/rstudiodatalab",
    "https://youtube.com/@rstudiodatalab",
    "https://www.linkedin.com/company/rstudiodatalabs",
    "https://www.tiktok.com/@rstudiodatalab",
    "https://whatsapp.com/channel/0029VaBzfy80G0XbCXhGGA16"
  ]
}
</script>

<!--6. Person Schema-->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Zubair Goraya",
  "url": "https://zubairgoraya.rstudiodatalab.com/",
  "image": "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXfle0EkfKiRR_8FJF9PzmQi7KFzefg1cqMqkdjLNB8fmLyfS1XkuwSk0JGKM00WdzzLhzgHNP7xbgCW9QSIqUDoSUDiqK8Xz-GEpwiqOffmpE0LJDR92r6MVIMsRjhLwhUAK9F-DVddAqtZmzFZW3jwrOGXxX3ThTT0nGzX8x11bkIzc/s220/WhatsApp_Image_2022-07-16_at_7.55.30_PM-removebg-preview-removebg-preview.webp",
  "jobTitle": "PhD Scholar & Statistical Consultant",
  "worksFor": {
    "@type": "Organization",
    "name": "RStudioDataLab"
  },
  "sameAs": [
    "https://www.facebook.com/zubeegoraya",
    "https://twitter.com/ZubairGoraya",
    "https://youtube.com/@data.03"
  ]
}
</script>

<!--7. WebSite Schema-->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "RStudioDataLab",
  "url": "https://www.rstudiodatalab.com/",
  "potentialAction": {
    "@type": "SearchAction",
    "target": {
      "@type": "EntryPoint",
      "urlTemplate": "https://www.rstudiodatalab.com/search?q={search_term_string}"
    },
    "query-input": "required name=search_term_string"
  }
}
</script>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.rstudiodatalab.com/2023/08/Confirmatory-Exploratory-Factor-Analysis.html"> RStudioDataLab</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/efa-vs-cfa-key-differences-between-exploratory-confirmatory-factor-analysis-r/">EFA vs CFA: Key Differences Between Exploratory & Confirmatory Factor Analysis (R)</a>]]></content:encoded>
					
		
		<enclosure url="https://www.data03.online/2023/09/Factor-Analysis.html" length="0" type="text/html" />

		<post-id xmlns="com-wordpress:feed-additions:1">402198</post-id>	</item>
		<item>
		<title>United Kingdom prime ministers by @ellis2013nz</title>
		<link>https://www.r-bloggers.com/2026/06/united-kingdom-prime-ministers-by-ellis2013nz/</link>
		
		<dc:creator><![CDATA[free range statistics - R]]></dc:creator>
		<pubDate>Mon, 22 Jun 2026 13:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://freerangestats.info/blog/2026/06/23/uk-prime-ministers</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> With the recent resignation announcement from United Kingdom (UK) Prime Minister Keir Starmer, there have been a flurry of people talking about how many UK prime ministers there have been in the past decade, short terms for prime ministers, and so on. ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/united-kingdom-prime-ministers-by-ellis2013nz/">United Kingdom prime ministers by @ellis2013nz</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://freerangestats.info/blog/2026/06/23/uk-prime-ministers"> free range statistics - R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>With the recent resignation announcement from United Kingdom (UK) Prime Minister Keir Starmer, there have been a flurry of people talking about how many UK prime ministers there have been in the past decade, short terms for prime ministers, and so on. I wanted a historical perspective and so grabbed the data from <a href="https://en.wikipedia.org/wiki/List_of_prime_ministers_of_the_United_Kingdom" rel="nofollow" target="_blank">Wikipedia</a>. Wikipedia has a convenient single table list of all of the UK prime ministers since the term began being used informally by Robert Walpole. Walpole was effectively prime minister of the Kingdom of Great Britain from 1721 onwards.</p>

<p>The first prime minister of the <em>United</em> Kingdom of Great Britain <em>and Ireland</em> was William Pitt in 1801; and of United Kingdom of Great Britain and <em>Northern</em> Ireland was Andrew Bonar Law in 1922. But these distinctions will be largely disregarded for the purpose of this blog post.</p>

<h2 id="downloading-prime-ministerial-data-from-wikipedia">Downloading prime ministerial data from Wikipedia</h2>

<p>Here’s code to download and import that list from Wikipedia. This worked as at 23 June 2026, but Wikipedia pages are known to change in format so it’s brittle about whether it will work forever:</p>

<figure class="highlight"><pre>library(rvest)
library(tidyverse)
library(janitor)
library(slider) # for rolling sum
library(scales)
library(ggrepel)
library(kableExtra)

#-----------------Import and process data------------------------

url &lt;- &quot;https://en.wikipedia.org/wiki/List_of_prime_ministers_of_the_United_Kingdom&quot;

page &lt;- read_html(url)

# The main PM table is the first wikitable on the page
pm_table &lt;- page |&gt;
  html_element(&quot;table.wikitable&quot;) |&gt;
  html_table(fill = TRUE) |&gt; 
  clean_names() |&gt; 
  select(
    pm = prime_minister_office_lifespan,
    start = term_of_office,
    end = term_of_office_2
  ) |&gt; 
  # drop second line of column titles:
  slice(-1) |&gt; 
  # find the PMs' names - everything up to the first [
  mutate(pm = str_extract(pm , &quot;.*?\\[&quot;),
         pm = str_replace(pm, &quot;\\[&quot;, &quot;&quot;),
        ) |&gt; 
  # strip all the footnotes and stuff from the dates:
  mutate(across(everything(), ~ str_remove_all(.x, &quot;\\[.*?\\]&quot;))) |&gt;  # remove [1] refs
  mutate(across(everything(), str_squish)) |&gt; 
  mutate(start = as.Date(start, format = &quot;%d %B %Y&quot;),
         end = as.Date(end, format = &quot;%d %B %Y&quot;),
         end = if_else(is.na(end) & pm == &quot;Keir Starmer&quot;,
                       as.Date(&quot;2026-07-10&quot;),
                       end),
        duration = as.numeric(end - start),) |&gt; 
  distinct() |&gt; 
  mutate(pm = fct_reorder(pm, start, .desc = TRUE)) |&gt; 
  group_by(pm) |&gt; 
  mutate(last_end = max(end)) |&gt; 
  ungroup()</pre></figure>

<h2 id="most-and-longest-serving-prime-ministers">Most and longest serving prime ministers</h2>

<p>This lets us do some simple analysis. First, here are the UK prime ministers who have served the most often—that is, had more than one term:</p>

<table class="table" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Prime minister </th>
   <th style="text-align:right;"> Terms </th>
   <th style="text-align:left;"> Earliest start </th>
   <th style="text-align:left;"> Latest finish </th>
   <th style="text-align:right;"> Total duration </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> William Ewart Gladstone </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:left;"> 1868-12-03 </td>
   <td style="text-align:left;"> 1894-03-02 </td>
   <td style="text-align:right;"> 4508 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Edward Smith-Stanley </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:left;"> 1852-02-23 </td>
   <td style="text-align:left;"> 1868-02-25 </td>
   <td style="text-align:right;"> 1381 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Robert Gascoyne-Cecil </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:left;"> 1885-06-23 </td>
   <td style="text-align:left;"> 1902-07-11 </td>
   <td style="text-align:right;"> 5000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Stanley Baldwin </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:left;"> 1923-05-22 </td>
   <td style="text-align:left;"> 1937-05-28 </td>
   <td style="text-align:right;"> 2639 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Thomas Pelham-Holles </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1754-03-16 </td>
   <td style="text-align:left;"> 1762-05-26 </td>
   <td style="text-align:right;"> 2763 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Charles Watson-Wentworth </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1765-07-13 </td>
   <td style="text-align:left;"> 1782-07-01 </td>
   <td style="text-align:right;"> 478 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> William Cavendish-Bentinck </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1783-04-02 </td>
   <td style="text-align:left;"> 1809-10-04 </td>
   <td style="text-align:right;"> 1178 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> William Pitt the Younger </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1783-12-19 </td>
   <td style="text-align:left;"> 1806-01-23 </td>
   <td style="text-align:right;"> 6917 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Arthur Wellesley </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1828-01-22 </td>
   <td style="text-align:left;"> 1834-12-09 </td>
   <td style="text-align:right;"> 1051 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> William Lamb </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1834-07-16 </td>
   <td style="text-align:left;"> 1841-08-30 </td>
   <td style="text-align:right;"> 2447 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Robert Peel </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1834-12-10 </td>
   <td style="text-align:left;"> 1846-06-29 </td>
   <td style="text-align:right;"> 1883 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Henry John Temple </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1855-02-06 </td>
   <td style="text-align:left;"> 1865-10-18 </td>
   <td style="text-align:right;"> 3429 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Benjamin Disraeli </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1868-02-27 </td>
   <td style="text-align:left;"> 1880-04-21 </td>
   <td style="text-align:right;"> 2530 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Ramsay MacDonald </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1924-01-22 </td>
   <td style="text-align:left;"> 1935-06-07 </td>
   <td style="text-align:right;"> 2480 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Winston Churchill </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1940-05-10 </td>
   <td style="text-align:left;"> 1955-04-05 </td>
   <td style="text-align:right;"> 3160 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Harold Wilson </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1964-10-16 </td>
   <td style="text-align:left;"> 1976-04-05 </td>
   <td style="text-align:right;"> 2835 </td>
  </tr>
</tbody>
</table>

<p>Since the mid twentieth century, only Churchill and Wilson have had a second chance to be prime minister. In the nineteenth century it was much more common, with big names like Gladstone, Disraeli and Gascoyne-Cecil dominating politics while in government and out.</p>

<p>Here are those who have served the longest durations in total:</p>

<table class="table" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Prime minister </th>
   <th style="text-align:right;"> Terms </th>
   <th style="text-align:left;"> Earliest start </th>
   <th style="text-align:left;"> Latest finish </th>
   <th style="text-align:right;"> Total duration </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Robert Walpole </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> 1721-04-03 </td>
   <td style="text-align:left;"> 1742-02-11 </td>
   <td style="text-align:right;"> 7619 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> William Pitt the Younger </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1783-12-19 </td>
   <td style="text-align:left;"> 1806-01-23 </td>
   <td style="text-align:right;"> 6917 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Robert Jenkinson </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> 1812-06-08 </td>
   <td style="text-align:left;"> 1827-04-09 </td>
   <td style="text-align:right;"> 5418 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Robert Gascoyne-Cecil </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:left;"> 1885-06-23 </td>
   <td style="text-align:left;"> 1902-07-11 </td>
   <td style="text-align:right;"> 5000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> William Ewart Gladstone </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:left;"> 1868-12-03 </td>
   <td style="text-align:left;"> 1894-03-02 </td>
   <td style="text-align:right;"> 4508 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Frederick North </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> 1770-01-28 </td>
   <td style="text-align:left;"> 1782-03-27 </td>
   <td style="text-align:right;"> 4441 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Margaret Thatcher </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> 1979-05-04 </td>
   <td style="text-align:left;"> 1990-11-28 </td>
   <td style="text-align:right;"> 4226 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Henry Pelham </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> 1743-08-27 </td>
   <td style="text-align:left;"> 1754-03-06 </td>
   <td style="text-align:right;"> 3844 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Tony Blair </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> 1997-05-02 </td>
   <td style="text-align:left;"> 2007-06-27 </td>
   <td style="text-align:right;"> 3708 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Henry John Temple </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1855-02-06 </td>
   <td style="text-align:left;"> 1865-10-18 </td>
   <td style="text-align:right;"> 3429 </td>
  </tr>
</tbody>
</table>

<p>The first UK prime minister, Robert Walpole, was also the longest serving. From the 20th and 21st century, only Thatcher and Blair make the top ten list.</p>

<p>Those two simple tables were produced with this code:</p>

<figure class="highlight"><pre>#------------summary highlights----------

# prime ministers number of terms and total duration:
pm_summary &lt;- pm_table |&gt; 
  rename(`Prime minister` = pm) |&gt; 
  group_by(`Prime minister`) |&gt; 
  summarise(Terms = length(`Prime minister`),
            `Earliest start` = min(start),
            `Latest finish` = max(end),
            `Total duration` = sum(duration)) |&gt; 
  arrange(desc(Terms), `Earliest start`) 

# Prime ministers with more than one term:
pm_summary |&gt; 
  filter(Terms &gt; 1) |&gt; 
  kable() |&gt; 
  kable_styling() 

# Longest serving prime ministers:
pm_summary |&gt; 
  arrange(desc(`Total duration`)) |&gt; 
  slice(1:10) |&gt; 
  kable() |&gt; 
  kable_styling() </pre></figure>

<h2 id="graphic-summaries">Graphic summaries</h2>

<p>Tables are nice but graphics are better. Here is my attempt to summarise all the prime ministers of the UK (and of the predecessor Kingdom of Great Britain) in one picture. You probably need a full-sized screen for this, but with the right display I think the Gantt chartish style works nicely.</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0323-gantt.svg" width="450"><img src="https://i2.wp.com/freerangestats.info/img/0323-gantt.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>That chart produced with this code. There are a few clutter-minimisation polishing details here on top of my usual blog style, like suppressing the y axis labels and adding them instead as text close to the data. Much easier to read. And suppressing the horizontal gridlines.</p>

<figure class="highlight"><pre>--------------Draw plots-------------
the_title &lt;- &quot;Prime ministers of the United Kingdom and its predecessors, 1721 to 2026&quot;

pm_table |&gt; 
  ggplot(aes(y = pm, yend = pm)) +
  geom_segment(aes(x = start, xend = end),
               linewidth = 2, colour = &quot;steelblue&quot;) +
  geom_text(data = distinct(pm_table, pm, last_end),
            aes(label = pm, x = last_end + 500),
            size = 2, hjust = 0, colour = &quot;grey50&quot;) +
  scale_x_date(
    breaks = seq(as.Date(&quot;1720-01-01&quot;), as.Date(&quot;2035-01-01&quot;), by = &quot;20 years&quot;),
    date_labels = &quot;%Y&quot;,
    sec.axis = sec_axis(~.),
  ) + 
  labs(x = &quot;Year&quot;,
       y = &quot;&quot;,
       title = the_title) +
  theme(axis.text.y = element_blank(),
        panel.grid.major.y  = element_blank(),
        panel.border = element_blank(), 
        axis.ticks.y = element_blank())</pre></figure>

<p>Secondly, it seems highly relevant to produce a plot of the distribution of durations:</p>
<object type="image/svg+xml" data="https://freerangestats.info/img/0323-density.svg" width="450"><img src="https://i0.wp.com/freerangestats.info/img/0323-density.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>And one showing the trend (or lack of trend) in durations over time:</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0323-over-time.svg" width="450"><img src="https://i2.wp.com/freerangestats.info/img/0323-over-time.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>Those two simple plots produced with this code. Perhaps the only point of particular interest here is how I used a subset of the data to highlight the names of prime ministers with term duration of less than 120 days or more than 3,000:</p>

<figure class="highlight"><pre>pm_table |&gt; 
  ggplot(aes(x = duration)) +
  geom_density(colour = &quot;steelblue&quot;) +
  geom_rug(colour = &quot;steelblue&quot;) +
  scale_x_continuous(label = comma) +
  labs(x = &quot;Duration in days&quot;,
       title = the_title)

pm_table |&gt; 
  ggplot(aes(x = start, y = duration)) +
  geom_smooth(method = &quot;gam&quot;, colour = &quot;white&quot;) +
  geom_point(colour = &quot;steelblue&quot;) +
  geom_text_repel(data = filter(pm_table, duration &lt; 120 | duration &gt; 3000), 
                   aes(label = pm), size = 2.8, seed = 123) +
  scale_y_sqrt(breaks = c(0.5, 1, 1:4 * 2) * 1000, label = comma) +
  labs(x = &quot;Starting date of premiership&quot;,
       y = &quot;Duration in days&quot;,
       title = the_title,
      subtitle = &quot;Durations shown are of individual periods in office, not lifetime totals.&quot;)</pre></figure>

<p>Finally, the big question that seems to get a lot of attention. How many prime ministers per decade? Below is my effort at calculating and presenting this.</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0323-rolling-pms.svg" width="450"><img src="https://i1.wp.com/freerangestats.info/img/0323-rolling-pms.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>We can see that we are indeed going through a decade that is rich in UK prime ministers (and will be richer still in a month or so). But it’s not unprecedented. We’ve been at similar levels seeral times in the past, and in the politically turbulent 1830s there were even more premierships.</p>

<p>In fact, in the late twentieth century with Thatcher and Blair, the UK faced a period of unusually slow turnover of prime ministers. But that was a formative period in the life of many of today’s political commentators, so its not surprising that the current rapid turnover comes across as a surprise.</p>

<p>Code for this is below. Note that I calculated this on a daily basis. I’m not 100% I’ve got it right, but it passes my simplest reality checks (eg manually counting those we’ve had in the past ten years &#8211; six so far, although expeted soon to become seven).</p>

<figure class="highlight"><pre>cumulative_pms &lt;- pm_table |&gt; 
  full_join(tibble(start = seq(from = min(pm_table$start), 
                               to = max(pm_table$end), 
                               by = &quot;1 day&quot;))) |&gt; 
  arrange(start) |&gt; 
  mutate(starting_pms = if_else(is.na(pm), 0 , 1),
         rolling_pms = slide_sum(starting_pms, before = 3653),
         # I'm not sure I've got this right yet, but the idea is that in any given day,
         # the number of PMs in thepast 10 years is however many started in those 10 years,
         # plus 1 PM that you came into the period with. The exception being the time
         # of the very first prime minister, for which time you only have the rolling sum
         # of PMs that started:
         rolling_pms = if_else(start &lt; (pm_table[1, ]$start + 3654), 
                               rolling_pms, 
                               rolling_pms + 1))

# When was the peak number of PMs in the last decade:
arrange(cumulative_pms, desc(rolling_pms))


cumulative_pms |&gt; 
  ggplot(aes(x = start, y = rolling_pms)) +
  geom_line(colour = &quot;steelblue&quot;) +
  scale_y_continuous(breaks = 0:max(cumulative_pms$rolling_pms)) +
  labs(x = &quot;&quot;,
       y = &quot;Number of prime ministers in past 10 years&quot;,
       title = the_title,
       subtitle = &quot;Peak prime ministers per decade was in the 1830s&quot;)</pre></figure>

<p>That’s all for now.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://freerangestats.info/blog/2026/06/23/uk-prime-ministers"> free range statistics - R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/united-kingdom-prime-ministers-by-ellis2013nz/">United Kingdom prime ministers by @ellis2013nz</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402185</post-id>	</item>
		<item>
		<title>PCA in R: Principal Component Analysis Step-by-Step (prcomp + ggplot2)</title>
		<link>https://www.r-bloggers.com/2026/06/pca-in-r-principal-component-analysis-step-by-step-prcomp-ggplot2/</link>
		
		<dc:creator><![CDATA[Unknown]]></dc:creator>
		<pubDate>Sat, 20 Jun 2026 19:23:48 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://www.r-bloggers.com/?guid=fb38f477d6919aff2bb4896aae300ed5</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> How to Perform PCA in R: A Step-by-Step Tutorial Using prcomp()<br />
To perform PCA in R, use the built-in prcomp() function: pca  1), or a cumulative variance threshold — this tutorial shows all three and where they disagree.<br />
PCA is not the same as Factor Analysis. The comparison table below explains exactly ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/pca-in-r-principal-component-analysis-step-by-step-prcomp-ggplot2/">PCA in R: Principal Component Analysis Step-by-Step (prcomp + ggplot2)</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.rstudiodatalab.com/2023/09/principal-component-analysis-in-r-i-pca.html"> RStudioDataLab</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<h2>How to Perform PCA in R: A Step-by-Step Tutorial Using prcomp()</h2>
<p style="text-align: justify;">To <a href="https://www.rstudiodatalab.com/2023/12/how-to-perform-ANCOVA-with-r.html" rel="nofollow" target="_blank">perform</a> <a href="https://www.rstudiodatalab.com/2023/09/principal-component-analysis-in-r-i-pca.html" rel="nofollow" target="_blank">PCA</a> in R, use the built-in <code>prcomp()</code> function: <code>pca &lt;- prcomp(data, scale = TRUE)</code>, then run <code>summary(pca)</code> to see how much variance each component explains. <code>prcomp()</code> handles centering, scaling, and the underlying singular value decomposition for you, and returns the loadings (<code>pca$rotation</code>) and component scores (<code>pca$x</code>) you need for any further <a href="https://www.rstudiodatalab.com/2023/09/Exploratory-Factor-Analysis.html" rel="nofollow" target="_blank">analysis</a> or <a href="https://www.rstudiodatalab.com/2023/11/create-stunning-data-visualization-in-r.html" rel="nofollow" target="_blank">visualization</a>.</p>


<b><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f4cb.png" alt="📋" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Complete Code — Copy, Paste, Run</b><br>
<pre># Complete PCA in R script -- copy, paste, run top to bottom

# 1. Install and load required packages
install.packages(c(&quot;ggplot2&quot;, &quot;factoextra&quot;))
library(ggplot2)
library(factoextra)
#2. Load the data and check assumptions
data(USArrests)
str(USArrests)
cor(USArrests)
apply(USArrests, 2, shapiro.test)
#3. Scale the data (prcomp() can also do this inline with scale = TRUE)
data_scaled &lt;- scale(USArrests)
head(data_scaled)
#4. Run PCA
pca &lt;- prcomp(USArrests, scale = TRUE)
summary(pca)
pca$rotation
#5. Decide how many components to keep
plot(pca$sdev^2, type = &quot;b&quot;, xlab = &quot;Principal Component&quot;, ylab = &quot;Eigenvalue&quot;, main = &quot;Scree Plot&quot;)
fviz_eig(pca, addlabels = TRUE, barfill = &quot;steelblue&quot;, barcolor = &quot;steelblue&quot;,
         linecolor = &quot;firebrick&quot;, main = &quot;Scree Plot: Variance Explained by Component&quot;)
pve &lt;- pca$sdev^2 / sum(pca$sdev^2)
cum_pve &lt;- cumsum(pve)
plot(cum_pve, type = &quot;b&quot;, xlab = &quot;Principal Component&quot;,
     ylab = &quot;Cumulative Proportion of Variance Explained&quot;)
#6. Visualize results
biplot(pca, scale = 0)
fviz_pca_biplot(pca,
                geom.ind = &quot;point&quot;,
                col.ind = as.factor(state.region),
                palette = &quot;jco&quot;,
                addEllipses = TRUE,
                col.var = &quot;black&quot;,
                repel = TRUE,
                legend.title = &quot;Region&quot;) +
  theme_minimal()</pre>
<a href="https://www.rstudiodatalab.com/2023/09/principal-component-analysis-in-r-i-pca.html#step-by-step" rel="nofollow" target="_blank">Jump to the full step-by-step explanation ↓</a><p style="text-align: justify;"></p>


<h2>Key Takeaways</h2>
<ul>
<li>Principal component analysis (PCA) in R turns a set of correlated variables into a smaller set of uncorrelated <b>principal components</b> that capture most of the original variance.</li>
<li>The built-in <code>prcomp()</code> function is the preferred way to run PCA in R because it uses singular value decomposition (SVD), which is numerically more stable than the older <code>princomp()</code> function.</li>
<li>On the classic <code>USArrests</code> dataset, the first two principal components explain <b>86.75%</b> of the total variance — verified directly from R output further down this page.</li>
<li>You can decide how many components to keep using a scree plot, the Kaiser rule (eigenvalue > 1), or a cumulative variance threshold — this tutorial shows all three and where they disagree.</li>
<li>PCA is not the same as Factor Analysis. The comparison table below explains exactly when to use each one.</li>
</ul>

<p style="text-align: justify;"></p><a href="https://i0.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiamON4oF0DKcNhjVbQSsylU98BbmjsLdQs2KSj1YPqDpVSYdCosfznUHzBe2u6GU7mDADf5vSRJi1Hzjy9zi5UuYKjw38JKcmIs83LwQNxbtc4j_G7iw1NNWdvgbsXCxWtTAM0rvI8l96PwyH9UjyJ7aPp0jzTiKbXrvi0JW7Qo2heWMsgseSRpwUEHRg/s1200/Principal%20Component%20Analysis%20in%20R%20I%20PCA%20Explained.png?ssl=1" rel="nofollow" target="_blank"><img src="https://i0.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiamON4oF0DKcNhjVbQSsylU98BbmjsLdQs2KSj1YPqDpVSYdCosfznUHzBe2u6GU7mDADf5vSRJi1Hzjy9zi5UuYKjw38JKcmIs83LwQNxbtc4j_G7iw1NNWdvgbsXCxWtTAM0rvI8l96PwyH9UjyJ7aPp0jzTiKbXrvi0JW7Qo2heWMsgseSRpwUEHRg/s16000/Principal%20Component%20Analysis%20in%20R%20I%20PCA%20Explained.png?w=578&#038;ssl=1" alt="PCA in R step-by-step tutorial using prcomp and ggplot2" class="full" data-recalc-dims="1"></a><br>

<p style="text-align: justify;">If you have a dataset with many correlated <a href="https://www.rstudiodatalab.com/2023/12/Create-New-Variables-in-R-with-dplyr.html" rel="nofollow" target="_blank">variables</a> and want to find the ones that actually matter, reduce its complexity, and visualize it in two dimensions without losing the patterns hidden inside it — <b>PCA in R</b> is the tool for the job.</p>

<p style="text-align: justify;">My name is Dr Zubair Goraya. I hold a PhD-level background in <a href="https://www.rstudiodatalab.com/2023/06/Descriptive-Analysis-RStudio.html" rel="nofollow" target="_blank">statistics</a> and have used R for statistical consulting and <a href="https://www.rstudiodatalab.com/2024/03/essential-rstudio-FAQs.html" rel="nofollow" target="_blank">research</a> for several years. I ran into the same questions you probably have now — how many components to keep, how to read the <code>prcomp()</code> output, how to build a biplot with <code>ggplot2</code> — while working through PCA for my own research, and I have since helped thesis and dissertation students work through the exact same problem.</p>

<h2>What Is Principal Component Analysis (PCA)?</h2>
<p style="text-align: justify;">Principal component analysis is a statistical technique for <b><a href="https://www.rstudiodatalab.com/2023/09/principal-component-analysis-in-r-i-pca.html" rel="nofollow" target="_blank">dimensionality reduction</a></b>. It takes a set of variables that are correlated with one another and re-expresses them as a smaller set of new, uncorrelated variables called <b>principal components</b>. The first principal component (PC1) captures the largest possible amount of variance in the data; the second (PC2) captures the largest amount of the variance left over after PC1, and so on, with each component orthogonal to the ones before it.</p>

<a href="https://i2.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQqrhiUbZUYHshJlF6vAA5ZF2mFy6bxssGmHPQq-p_fMTVSq-O6M-zrt1D3nCBpI8ZNjb5BelUOYVyJBwp_wE4Js4Ek_5CVNwP9YWf8_svDrSrkZicDDmA5vV9_U6fIEDsPn_UeRR8XK1rB6YDbsMvTPnYscT_ZZCjDQKguw3Op6K6MziPSYpe9P6xXrA/s2559/napkin-selection.png?ssl=1" rel="nofollow" target="_blank"><img src="https://i1.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQqrhiUbZUYHshJlF6vAA5ZF2mFy6bxssGmHPQq-p_fMTVSq-O6M-zrt1D3nCBpI8ZNjb5BelUOYVyJBwp_wE4Js4Ek_5CVNwP9YWf8_svDrSrkZicDDmA5vV9_U6fIEDsPn_UeRR8XK1rB6YDbsMvTPnYscT_ZZCjDQKguw3Op6K6MziPSYpe9P6xXrA/w640-h424/napkin-selection.png?w=578&#038;ssl=1" alt="When to use PCA in R for dimensionality reduction" class="full" data-recalc-dims="1"></a>

<h3>How PCA Works Under the Hood</h3>
<p style="text-align: justify;">You don't need to compute this by hand — <code>prcomp()</code> does it for you — but <a href="https://www.rstudiodatalab.com/2024/02/linear-discriminant-analysis-LDA.html" rel="nofollow" target="_blank">understanding</a> the four steps helps you interpret the output correctly:</p>
<ol>
<li>Standardize each variable (mean 0, standard deviation 1) so no single variable dominates because of its scale.</li>
<li>Compute the <b>covariance matrix</b> of the standardized variables, which captures how every pair of variables moves together.</li>
<li>Decompose that matrix into <b>eigenvectors</b> (the direction of each principal component) and <b>eigenvalues</b> (how much variance that direction explains). The eigenvector with the largest eigenvalue is PC1.</li>
<li>Project the original data onto the new eigenvector axes. This produces the <b>component scores</b> — the new coordinates for every observation in the reduced space.</li>
</ol>
<p style="text-align: justify;"><code>prcomp()</code> performs this using singular value decomposition (SVD) rather than direct eigendecomposition of the covariance matrix, which is the same result reached through a numerically more stable route — this is why <code>prcomp()</code> is preferred over <code>princomp()</code> in R.</p>

<h2>What You Need Before Running PCA in R</h2>
<p style="text-align: justify;">PCA only works on <b>numeric, continuous</b> variables. You'll use:</p>
<ul>
<li><code>prcomp()</code> — built into base R's <code>stats</code> package, no installation needed.</li>
<li><a href="https://www.rstudiodatalab.com/2024/04/how-to-install-ggplot2-in-r.html" rel="nofollow" target="_blank"><code>ggplot2</code></a> — for the custom scree plot and biplot in this tutorial.</li>
<li><code>factoextra</code> — a <code>ggplot2</code>-based wrapper that turns PCA output into publication-ready plots with a single function call.</li>
</ul>
<pre>install.packages(c(&quot;ggplot2&quot;, &quot;factoextra&quot;))
library(ggplot2)
library(factoextra)</pre>
<p style="text-align: justify;">This tutorial uses <code>USArrests</code>, a dataset built into R covering violent crime rates and urban population percentage for all 50 US states in 1973 — the same dataset used in <i>An Introduction to Statistical Learning</i>, so you can cross-check your output against a well-known reference.</p>

<h2>Step 1: Check Your PCA Assumptions in R</h2>
<p style="text-align: justify;">PCA assumes your variables are numeric, continuous, linearly related, and reasonably normally distributed. Confirm this before you run anything:</p>
<pre>data(USArrests)
str(USArrests)                       # numeric and continuous?
cor(USArrests)                       # linearly related?
apply(USArrests, 2, shapiro.test)    # normally distributed?</pre>
<p style="text-align: justify;">Running a <a href="https://www.rstudiodatalab.com/2024/10/shapiro-wilk-normality-test-shapirotest.html" rel="nofollow" target="_blank">Shapiro-Wilk normality test</a> on each variable shows that <code>UrbanPop</code> is the only variable that does not significantly depart from normality (p = 0.977); <code>Murder</code>, <code>Assault</code>, and <code>Rape</code> all return p-values below 0.05:</p>


<table>
<thead><tr><th>Variable</th><th>W</th><th>p-value</th><th>Significant?</th></tr></thead>
<tbody>
<tr><td>Murder</td><td>0.957</td><td>0.067</td><td>Yes</td></tr>
<tr><td>Assault</td><td>0.952</td><td>0.041</td><td>Yes</td></tr>
<tr><td>UrbanPop</td><td>0.977</td><td>0.439</td><td>No</td></tr>
<tr><td>Rape</td><td>0.947</td><td>0.025</td><td>Yes</td></tr>
</tbody>
</table>


<p style="text-align: justify;">PCA is fairly robust to mild non-normality, especially with n = 50, so this tutorial proceeds with all four variables rather than dropping any. If your own data fails normality more severely, consider a transformation first, or revisit whether PCA is the right tool at all — see the PCA vs. <a href="https://www.rstudiodatalab.com/2023/08/Confirmatory-Exploratory-Factor-Analysis.html" rel="nofollow" target="_blank">Factor Analysis comparison</a> further down this page.</p>

<h2>Step 2: Scale and Center Your Data</h2>
<p style="text-align: justify;">Murder, Assault, and Rape are measured per 100,000 residents while UrbanPop is a percentage — different scales entirely. Without scaling, <code>Assault</code> (values up to 337) would dominate the first component purely because of its larger numbers, not because it is more important. <code>prcomp()</code> has a built-in <code>scale.</code> argument that handles this for you, but you can also <a href="https://www.rstudiodatalab.com/2023/08/how-to-normalize-data-r-my-data.html" rel="nofollow" target="_blank">normalize the data</a> manually first using <a href="https://www.rstudiodatalab.com/2023/10/Z-Score-in-R.html" rel="nofollow" target="_blank">Z-score standardization</a> if you want to inspect the scaled values:</p>
<pre>data_scaled &lt;- scale(USArrests)
head(data_scaled)</pre>

<h2>Step 3: Run PCA in R With prcomp()</h2>
<p style="text-align: justify;">Now run PCA on the full four-variable dataset, with scaling handled inline:</p>
<pre>pca &lt;- prcomp(USArrests, scale = TRUE)
summary(pca)</pre>
<p style="text-align: justify;">This is the actual <code>summary(pca)</code> output for <code>USArrests</code>:</p>


<table>
<thead><tr><th>Importance of components</th><th>PC1</th><th>PC2</th><th>PC3</th><th>PC4</th></tr></thead>
<tbody>
<tr><td><b>Standard deviation</b></td><td>1.5749</td><td>0.9949</td><td>0.5971</td><td>0.4164</td></tr>
<tr><td><b>Proportion of Variance</b></td><td>0.6201</td><td>0.2474</td><td>0.0891</td><td>0.0434</td></tr>
<tr><td><b>Cumulative Proportion</b></td><td>0.6201</td><td>0.8675</td><td>0.9566</td><td>1.0000</td></tr>
</tbody>
</table>


<p style="text-align: justify;">PC1 alone explains 62.0% of the total variance; PC1 and PC2 together explain <b>86.75%</b>. That means you can reduce four variables down to two principal components and still retain most of the information in the original data.</p>
<p style="text-align: justify;">Looking at <code>pca$rotation</code> shows what each component represents: PC1 loads heavily and negatively on <code>Murder</code> (-0.536), <code>Assault</code> (-0.583), and <code>Rape</code> (-0.543), with a smaller loading on <code>UrbanPop</code> (-0.278) — so PC1 is best read as an overall <b>violent crime</b> axis. PC2 loads almost entirely on <code>UrbanPop</code> (0.873), making it an <b>urbanization</b> axis largely independent of crime rate.</p>

<h2>Step 4: Decide How Many Components to Keep</h2>
<p style="text-align: justify;">There is no single universal rule. In practice, researchers triangulate using three methods — and on this dataset, they don't all agree, which is itself a useful teaching example.</p>

<h3>Method 1: The Scree Plot</h3>
<pre>plot(pca$sdev^2, type = &quot;b&quot;, xlab = &quot;Principal Component&quot;, ylab = &quot;Eigenvalue&quot;, main = &quot;Scree Plot&quot;)</pre>
<a href="https://i1.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqoJNsdwenBgvi1MxQUz2x5_0HzbwA0a2_dTjEeRLzYNGlquiq96iyS5JL8k54HJ-frPoi7XyEwyu-ZhVFfUdM4bGgqn2urM1vqTQHmQE2ni5EClHsvL0T_DzQIuM-R8QtVaMUe3Xv4A9rmg60Ad6tM_uguJJUtryYBElCnxFjkt7hLkKSsi9pCSYTkXc/s1200/Scree%20Plot.jpeg?ssl=1" rel="nofollow" target="_blank"><img src="https://i0.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqoJNsdwenBgvi1MxQUz2x5_0HzbwA0a2_dTjEeRLzYNGlquiq96iyS5JL8k54HJ-frPoi7XyEwyu-ZhVFfUdM4bGgqn2urM1vqTQHmQE2ni5EClHsvL0T_DzQIuM-R8QtVaMUe3Xv4A9rmg60Ad6tM_uguJJUtryYBElCnxFjkt7hLkKSsi9pCSYTkXc/w640-h342/Scree%20Plot.jpeg?w=578&#038;ssl=1" alt="Scree plot of PCA eigenvalues in R showing the elbow point" class="full" data-recalc-dims="1"></a>
<p style="text-align: justify;">Look for the "<a href="https://www.rstudiodatalab.com/2023/08/elbow-method-R-optimal-number-clusters.html" rel="nofollow" target="_blank">elbow</a>" — the point where the line flattens out. The components before the elbow are worth keeping; the ones after add little.</p>

<h3>The ggplot2 / factoextra Version</h3>
<p style="text-align: justify;">For a publication-ready scree plot built on <code>ggplot2</code> instead of base <a href="https://www.rstudiodatalab.com/2023/06/ggplot2-comprehensive-guide-to-data.html" rel="nofollow" target="_blank">R graphics</a>, <code>factoextra</code> gives you a single-line solution:</p>
<pre>fviz_eig(pca, addlabels = TRUE, barfill = &quot;steelblue&quot;, barcolor = &quot;steelblue&quot;,
         linecolor = &quot;firebrick&quot;, main = &quot;Scree Plot: Variance Explained by Component&quot;)</pre>
<p style="text-align: justify;"><code>fviz_eig()</code> returns a standard <code>ggplot</code> object, so you can layer on any further <code>ggplot2</code> theming (<code>+ theme_minimal()</code>, custom labels, color palettes) exactly as you would with any other <code>ggplot2</code> chart.</p>

<h3>Method 2: The Kaiser Rule (Eigenvalue > 1)</h3>
<p style="text-align: justify;">The Kaiser criterion says: keep any component whose eigenvalue (standard deviation squared) exceeds 1. On this dataset:</p>

<table>
<thead><tr><th>Component</th><th>Eigenvalue</th><th>Keep under Kaiser rule?</th></tr></thead>
<tbody>
<tr><td>PC1</td><td>2.480</td><td>Yes</td></tr>
<tr><td>PC2</td><td>0.990</td><td>No (just under 1)</td></tr>
<tr><td>PC3</td><td>0.357</td><td>No</td></tr>
<tr><td>PC4</td><td>0.173</td><td>No</td></tr>
</tbody>
</table>

<p style="text-align: justify;">This is a good example of why the Kaiser rule shouldn't be used in isolation: PC2's eigenvalue (0.990) is essentially 1, so a strict cutoff would discard a component that still explains nearly a quarter of total variance and has a clear, interpretable meaning (urbanization). Most applied researchers would retain PC2 here despite the Kaiser rule's technical "no."</p>

<h3>Method 3: Cumulative Variance Threshold</h3>
<pre>pve &lt;- pca$sdev^2 / sum(pca$sdev^2)
cum_pve &lt;- cumsum(pve)
plot(cum_pve, type = &quot;b&quot;, xlab = &quot;Principal Component&quot;,
     ylab = &quot;Cumulative Proportion of Variance Explained&quot;)</pre>
<a href="https://i0.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhAC035pgSnnCsFju-b9GgXCoBtxGxaxY1DO_9vyWfrTlErr0zBpNy1Uw68IfePiDE0XllTxgo1bXJ1os14W4l3mnupnvBmjX6pcJu4jqyj49KiJoQVupm5prPRd0g6F5_eOnkndV2eXrrn3spUOjScrDFTmlYCiUTsOM3OHob4c2G9WaW0x5hkz0IgBo/s1200/cumulative%20proportion%20of%20variance.jpeg?ssl=1" rel="nofollow" target="_blank"><img src="https://i0.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhAC035pgSnnCsFju-b9GgXCoBtxGxaxY1DO_9vyWfrTlErr0zBpNy1Uw68IfePiDE0XllTxgo1bXJ1os14W4l3mnupnvBmjX6pcJu4jqyj49KiJoQVupm5prPRd0g6F5_eOnkndV2eXrrn3spUOjScrDFTmlYCiUTsOM3OHob4c2G9WaW0x5hkz0IgBo/w640-h342/cumulative%20proportion%20of%20variance.jpeg?w=578&#038;ssl=1" alt="Cumulative proportion of variance explained by each PCA component in R" class="full" data-recalc-dims="1"></a>
<p style="text-align: justify;">A common threshold is 80–90% cumulative variance. Here, two components clear 80% (86.75%), so <b>retaining PC1 and PC2</b> is the most defensible choice across all three methods combined.</p>

<h2>Step 5: Visualize PCA Results — Biplot in R</h2>
<p style="text-align: justify;">A biplot overlays the component scores (observations) and the loadings (original variables) on the same PC1/PC2 plane.</p>
<h3>Base R Biplot</h3>
<pre>biplot(pca, scale = 0)</pre>
<a href="https://i2.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiK3FwKbM1gZIRugAgvoKCnT-yaj0a5ocPmyuQilRZUot9cJNrKBo6YFDb3hD2P5ihp4QAYtt0qDiEAYlm-b1fAv4IDO319F7jwhhAkogtBALIJYkpa0sLmltAhb9uybI5aIGHeiZRV8CASpostlpN6B9ZAHMM8YFp14wGVOGD5es2cgN8AGdVFEGZ9hjQ/s1200/Principal%20component%20loadings%20plot%20for%20USArrests%20data.jpeg?ssl=1" rel="nofollow" target="_blank"><img src="https://i2.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiK3FwKbM1gZIRugAgvoKCnT-yaj0a5ocPmyuQilRZUot9cJNrKBo6YFDb3hD2P5ihp4QAYtt0qDiEAYlm-b1fAv4IDO319F7jwhhAkogtBALIJYkpa0sLmltAhb9uybI5aIGHeiZRV8CASpostlpN6B9ZAHMM8YFp14wGVOGD5es2cgN8AGdVFEGZ9hjQ/w640-h342/Principal%20component%20loadings%20plot%20for%20USArrests%20data.jpeg?w=578&#038;ssl=1" alt="Base R biplot of PCA loadings and scores for USArrests data" class="full" data-recalc-dims="1"></a>

<h3>PCA Biplot With ggplot2 (factoextra)</h3>
<p style="text-align: justify;">The base R biplot is functional but not very customizable. For a <code>ggplot2</code> biplot with colored points, <a href="https://www.rstudiodatalab.com/2023/10/confidence-intervals-in-r.html" rel="nofollow" target="_blank">confidence</a> ellipses, and repelled labels, use <code>fviz_pca_biplot()</code>:</p>
<pre>fviz_pca_biplot(pca,
                geom.ind = &quot;point&quot;,
                col.ind = as.factor(state.region),
                palette = &quot;jco&quot;,
                addEllipses = TRUE,
                col.var = &quot;black&quot;,
                repel = TRUE,
                legend.title = &quot;Region&quot;) +
  theme_minimal()</pre>
<p style="text-align: justify;">Because <code>fviz_pca_biplot()</code> returns a <code>ggplot</code> object, every <code>ggplot2</code> layer works: swap <code>palette</code>, add <code>+ labs(title = &quot;...&quot;)</code>, or change the theme without touching the underlying PCA math.</p>

<a href="https://i0.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8G5nDfk_C9u2keAew24Em6HcRyTtSxtO9Z_0ixDC83B-9aH8fcVXoLAhDbv-MMoTbDnDL-jMfzEHOzOjeZNX5i2vfahJT0owDVZNBErukbv6e9AZ9YIIcmbvXWMqU1kJcjkQvi6dbALMlUMMFiSi2ZtzeFpXrP5cNTWAF3yg30_1Nr8JwD_vis7p5_yU/s937/principal%20component%20scores%20with%20ellipses%20by%20region.jpeg?ssl=1" rel="nofollow" target="_blank"><img src="https://i1.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8G5nDfk_C9u2keAew24Em6HcRyTtSxtO9Z_0ixDC83B-9aH8fcVXoLAhDbv-MMoTbDnDL-jMfzEHOzOjeZNX5i2vfahJT0owDVZNBErukbv6e9AZ9YIIcmbvXWMqU1kJcjkQvi6dbALMlUMMFiSi2ZtzeFpXrP5cNTWAF3yg30_1Nr8JwD_vis7p5_yU/w640-h364/principal%20component%20scores%20with%20ellipses%20by%20region.jpeg?w=578&#038;ssl=1" alt="ggplot2 PCA biplot with confidence ellipses by US region" class="full" data-recalc-dims="1"></a>
<p style="text-align: justify;">Southern states <a href="https://www.rstudiodatalab.com/2023/07/cluster-analysis-in-data-mining-a-beginners-guide.html" rel="nofollow" target="_blank">cluster</a> toward high values on PC1 (higher violent crime), while Western states spread further along PC2 (higher urbanization) — exactly the two axes the loadings predicted in Step 3.</p>

<h2>PCA vs. Factor Analysis: What's the Difference?</h2>
<p style="text-align: justify;">Students frequently confuse PCA with <a href="https://www.rstudiodatalab.com/2023/09/factor-analysis-and-principal-component-analysis.html" rel="nofollow" target="_blank">Factor Analysis</a> because both reduce a large set of variables down to a few. They are not interchangeable, and using the wrong one is a common mistake in thesis methodology chapters:</p>


<table>
<thead><tr><th></th><th>PCA</th><th>Factor Analysis</th></tr></thead>
<tbody>
<tr><td><b>Goal</b></td><td>Maximize explained variance; data reduction</td><td>Identify latent constructs that cause the observed variables</td></tr>
<tr><td><b>Variance modeled</b></td><td>All variance (common + unique)</td><td>Only shared (common) variance; unique/error variance is separated out</td></tr>
<tr><td><b>Assumes a latent variable?</b></td><td>No — purely a mathematical transformation</td><td>Yes — assumes observed variables reflect underlying factors</td></tr>
<tr><td><b>Rotation</b></td><td>Not typically used</td><td>Often rotated (varimax, promax) for interpretability</td></tr>
<tr><td><b>R function</b></td><td><code>prcomp()</code></td><td><code>factanal()</code> or <code>psych::fa()</code></td></tr>
<tr><td><b>Use when</b></td><td>You want to reduce dimensionality or visualize structure</td><td>You're testing a theory about what underlying constructs explain your items</td></tr>
</tbody>
</table>

<p style="text-align: justify;">If you're building a scale to measure a psychological construct (anxiety, job satisfaction, brand loyalty), you almost certainly want <a href="https://www.rstudiodatalab.com/2023/09/exploratory-factor-analysis-efa-in-r.html" rel="nofollow" target="_blank">Exploratory Factor Analysis</a>, not PCA — even though both are run on similarly structured survey data.</p>

<h2>How to Report PCA Results in APA Style</h2>
<p style="text-align: justify;">For a thesis or journal manuscript, your methods and results sections should report PCA in a standard format. Based on the <code>USArrests</code> output above, here is a template you can adapt:</p>
<p style="text-align: justify;"><b>Sample write-up:</b><br>
"A principal component analysis (PCA) was conducted on four crime-related variables (Murder, Assault, UrbanPop, Rape) using the <code>prcomp()</code> function in R (R Core Team). Variables were standardized prior to analysis. Using the Kaiser criterion (eigenvalue > 1) in combination with a scree plot and an 80% cumulative variance threshold, two components were retained. PC1 explained 62.0% of the variance and was driven primarily by Murder, Assault, and Rape (loadings of -0.54, -0.58, and -0.54, respectively), suggesting a general violent crime dimension. PC2 explained an additional 24.7% of the variance and loaded almost exclusively on UrbanPop (0.87), representing an urbanization dimension. Together, the two components accounted for 86.75% of total variance."</p>
<p style="text-align: justify;">Always include a loadings table (as shown in Step 3) and either a scree plot or a cumulative-variance plot as a figure when reporting PCA in a formal write-up — reviewers and committee members expect to see the basis for your retention decision, not just the final component <a href="https://www.rstudiodatalab.com/2024/05/count-function-in-r-i-dplyrcount.html" rel="nofollow" target="_blank">count</a>.</p>

<h2>Further Analysis: Using PCA Output Downstream</h2>
<p style="text-align: justify;">Once you have component scores (<code>pca$x</code>), they become inputs for other analyses — for example, as features for <a href="https://www.rstudiodatalab.com/2023/08/elbow-method-R-optimal-number-clusters.html" rel="nofollow" target="_blank">k-means clustering</a> on a reduced, decorrelated dataset, or to detect <a href="https://www.rstudiodatalab.com/2023/07/Multicollinearity-Ridge-Regression.html" rel="nofollow" target="_blank">multicollinearity</a> before fitting a <a href="https://www.rstudiodatalab.com/2023/07/ridge-regression.html" rel="nofollow" target="_blank">regression</a> model.</p>

<h2>Conclusion</h2>
<p style="text-align: justify;">You've now run a complete PCA in R: checking assumptions, scaling the data, fitting <code>prcomp()</code>, <a href="https://www.rstudiodatalab.com/2023/09/factor-analysis-and-principal-component-analysis.html" rel="nofollow" target="_blank">choosing</a> how many components to keep using three different methods, visualizing results in both base R and <code>ggplot2</code>, distinguishing PCA from <a href="https://www.rstudiodatalab.com/2023/09/How-I-Perform-Factor-Analysis-in-R.html" rel="nofollow" target="_blank">Factor</a> Analysis, and reporting the output in APA format. The same workflow applies directly to your own dataset — just swap <code>USArrests</code> for your data frame.</p>

<h2>Frequently Asked Questions</h2>



What is the best R package for PCA?
<p style="text-align: justify;">For most users, the built-in <code>prcomp()</code> function (stats package) is sufficient and is preferred over <code>princomp()</code> for its numerical stability. For richer visualization, pair it with <code>factoextra</code>. For multivariate <a href="https://www.rstudiodatalab.com/2023/11/exploratory-Data-analysis-international-journals.html" rel="nofollow" target="_blank">exploratory</a> work beyond PCA — including correspondence analysis — <code>FactoMineR</code> is the more comprehensive package.</p>



How do I plot PCA results in R?
<p style="text-align: justify;">Use <code>plot(pca)</code> for a quick scree plot, <code>biplot(pca)</code> for a base R biplot of scores and loadings, or <code>factoextra::fviz_pca_biplot(pca)</code> for a fully customizable <code>ggplot2</code> version with color, ellipses, and labels, as shown in Step 5 above.</p>



What does prcomp() return in R?
<p style="text-align: justify;"><code>prcomp()</code> returns an object of class "prcomp" containing: <code>sdev</code> (standard deviations of each component), <code>rotation</code> (the loadings matrix), <code>x</code> (the component scores for each observation), and <code>center</code>/<code>scale</code> (the centering and scaling values used).</p>



What is the difference between prcomp and princomp in R?
<p style="text-align: justify;"><code>princomp()</code> uses spectral (eigen) decomposition of the covariance matrix. <code>prcomp()</code> uses singular value decomposition (SVD) of the data matrix directly, which has slightly better numerical accuracy. R's own documentation recommends <code>prcomp()</code> for this reason.</p>



How many principal components should I keep?
<p style="text-align: justify;">There's no universal rule. Triangulate using a scree plot (look for the elbow), the Kaiser criterion (eigenvalue > 1), and a cumulative variance threshold (commonly 80–90%) — Step 4 above shows all three applied to the same dataset, including a case where they disagree.</p>



Can PCA be used on categorical variables?
<p style="text-align: justify;">Standard PCA requires numeric, continuous variables. For <a href="https://www.rstudiodatalab.com/2023/09/Logistic-Regression-in-R-with-Categorical-Variables.html" rel="nofollow" target="_blank">categorical data</a>, use <a href="https://www.rstudiodatalab.com/2023/10/ggplot2-multiple-plots-in-r.html" rel="nofollow" target="_blank">Multiple</a> Correspondence Analysis (MCA) instead, available via <code>FactoMineR::MCA()</code>, which applies the same dimensionality-reduction logic to a <a href="https://www.rstudiodatalab.com/2023/11/secrets-of-r-contingency-tables.html" rel="nofollow" target="_blank">contingency</a> table of category frequencies.</p>



What are common mistakes when running PCA in R?
<p style="text-align: justify;">The three most frequent errors: (1) skipping <code>scale = TRUE</code>, which lets large-magnitude variables dominate the first component; (2) not checking for missing values or <a href="https://www.rstudiodatalab.com/2023/10/Remove-Outliers-Data-Cleaning-R.html" rel="nofollow" target="_blank">outliers</a> beforehand, which distorts the covariance matrix; and (3) misreading the scree plot elbow or ignoring loading signs when interpreting what a component represents.</p>



What are real-world applications of PCA?
<p style="text-align: justify;">PCA is widely used for image compression, facial recognition feature extraction, gene expression analysis in bioinformatics, customer segmentation in marketing, and as a preprocessing step before <a href="https://www.rstudiodatalab.com/2023/07/how-to-fill-color-regions-in-r-k-means-clustering.html" rel="nofollow" target="_blank">clustering</a> or regression to <a href="https://www.rstudiodatalab.com/2024/06/remove-rows-from-dataframe-based-on-condition.html" rel="nofollow" target="_blank">remove</a> <a href="https://www.rstudiodatalab.com/2023/07/Multicollinearity-Ridge-Regression.html" rel="nofollow" target="_blank">multicollinearity</a>.</p>




<p style="text-align: justify;">If you're working through PCA, Factor Analysis, or any other multivariate technique for a thesis or dissertation and want a second pair of eyes on your output, get in touch below.</p>

<a class="button" href="https://wa.me/923106367532?text=Hi%2C%20I%20need%20help%20with%20PCA%20in%20R%20for%20my%20research" rel="nofollow" target="_blank">
  <svg class="line" style="margin-right: 12px; stroke: rgb(255, 255, 255);" viewbox="0 0 24 24"><g transform="translate(2.000000, 2.500000)"><path d="M0.7501,0.7499 L2.8301,1.1099 L3.7931,12.5829 C3.8701,13.5199 4.6531,14.2389 5.5931,14.2359094 L16.5021,14.2359094 C17.3991,14.2379 18.1601,13.5779 18.2871,12.6899 L19.2361,6.1319 C19.3421,5.3989 18.8331,4.7189 18.1011,4.6129 C18.0371,4.6039 3.1641,4.5989 3.1641,4.5989"></path><line x1="12.1251" x2="14.8981" y1="8.2948" y2="8.2948"></line><path d="M5.1544,17.7025 C5.4554,17.7025 5.6984,17.9465 5.6984,18.2465 C5.6984,18.5475 5.4554,18.7915 5.1544,18.7915 C4.8534,18.7915 4.6104,18.5475 4.6104,18.2465 C4.6104,17.9465 4.8534,17.7025 5.1544,17.7025 Z"></path><path d="M16.4347,17.7025 C16.7357,17.7025 16.9797,17.9465 16.9797,18.2465 C16.9797,18.5475 16.7357,18.7915 16.4347,18.7915 C16.1337,18.7915 15.8907,18.5475 15.8907,18.2465 C15.8907,17.9465 16.1337,17.7025 16.4347,17.7025 Z"></path></g></svg>
  <span>Get PCA / Statistics Help on WhatsApp</span>
</a>
<a class="button ln" href="https://www.rstudiodatalab.com/p/join-our-community.html" rel="nofollow" target="_blank"><i class="icon demo"></i>Join Our Community</a>

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "What is the best R package for PCA?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "For most users, the built-in prcomp() function (stats package) is sufficient and is preferred over princomp() for its numerical stability. For richer visualization, pair it with factoextra. For multivariate exploratory work beyond PCA, FactoMineR is the more comprehensive package."
    }
  },{
    "@type": "Question",
    "name": "How do I plot PCA results in R?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "Use plot(pca) for a quick scree plot, biplot(pca) for a base R biplot of scores and loadings, or factoextra::fviz_pca_biplot(pca) for a fully customizable ggplot2 version with color, ellipses, and labels."
    }
  },{
    "@type": "Question",
    "name": "What does prcomp() return in R?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "prcomp() returns an object of class prcomp containing sdev (standard deviations of each component), rotation (the loadings matrix), x (the component scores for each observation), and center/scale (the centering and scaling values used)."
    }
  },{
    "@type": "Question",
    "name": "What is the difference between prcomp and princomp in R?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "princomp() uses spectral (eigen) decomposition of the covariance matrix. prcomp() uses singular value decomposition (SVD) of the data matrix directly, which has slightly better numerical accuracy, which is why R's documentation recommends prcomp()."
    }
  },{
    "@type": "Question",
    "name": "How many principal components should I keep?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "There is no universal rule. Triangulate using a scree plot elbow, the Kaiser criterion (eigenvalue greater than 1), and a cumulative variance threshold, commonly 80 to 90 percent."
    }
  },{
    "@type": "Question",
    "name": "Can PCA be used on categorical variables?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "Standard PCA requires numeric, continuous variables. For categorical data, use Multiple Correspondence Analysis (MCA) instead, available via FactoMineR::MCA()."
    }
  },{
    "@type": "Question",
    "name": "What are common mistakes when running PCA in R?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "The three most frequent errors are skipping scale = TRUE which lets large-magnitude variables dominate the first component, not checking for missing values or outliers beforehand, and misreading the scree plot elbow or ignoring loading signs when interpreting a component."
    }
  },{
    "@type": "Question",
    "name": "What are real-world applications of PCA?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "PCA is widely used for image compression, facial recognition feature extraction, gene expression analysis in bioinformatics, customer segmentation in marketing, and as a preprocessing step before clustering or regression to remove multicollinearity."
    }
  }]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org/",
  "@type": "BlogPosting",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://www.rstudiodatalab.com/2023/09/principal-component-analysis-in-r-i-pca.html"
  },
  "headline": "PCA in R: Step-by-Step Guide (prcomp + ggplot2)",
  "description": "Run PCA in R with prcomp(). Step-by-step guide to scaling, scree plots, biplots, variance explained, and ggplot2 visualization with real output.",
  "image": {
    "@type": "ImageObject",
    "url": "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiamON4oF0DKcNhjVbQSsylU98BbmjsLdQs2KSj1YPqDpVSYdCosfznUHzBe2u6GU7mDADf5vSRJi1Hzjy9zi5UuYKjw38JKcmIs83LwQNxbtc4j_G7iw1NNWdvgbsXCxWtTAM0rvI8l96PwyH9UjyJ7aPp0jzTiKbXrvi0JW7Qo2heWMsgseSRpwUEHRg/s1200/Principal%20Component%20Analysis%20in%20R%20I%20PCA%20Explained.png",
    "width": "1200",
    "height": "640"
  },
  "author": {
    "@type": "Person",
    "name": "Zubair Goraya",
    "url": "https://www.blogger.com/profile/14094023908139938513"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Data Analysis",
    "logo": {
      "@type": "ImageObject",
      "url": "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXsinGH5wLHGxh8ar91Nq6n3eic-SPa_P_4-hTutwHHiCHU9OqqxyZjiEYP0NQXGxOxK45U3vh4fsbBoIRQdXZSu0d6si24k3fP1wGw6SWSRCZDMA0QxTZyFJfjtdhe8o9E1uP55zZj5xBBgnwW6BXovUu7l1Spr2nUQyMxZGPNNelenkHoyaxHy-_X-o/s150/Data%20Analysis.webp",
      "width": "150",
      "height": "50"
    }
  },
  "datePublished": "2023-09-30",
  "dateModified": "2026-06-20"
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org/", 
  "@type": "Product", 
  "name": "PCA in R: Step-by-Step Guide (prcomp + ggplot2)",
  "image": "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiamON4oF0DKcNhjVbQSsylU98BbmjsLdQs2KSj1YPqDpVSYdCosfznUHzBe2u6GU7mDADf5vSRJi1Hzjy9zi5UuYKjw38JKcmIs83LwQNxbtc4j_G7iw1NNWdvgbsXCxWtTAM0rvI8l96PwyH9UjyJ7aPp0jzTiKbXrvi0JW7Qo2heWMsgseSRpwUEHRg/s1200/Principal%20Component%20Analysis%20in%20R%20I%20PCA%20Explained.png",
  "description": "Principal Component Analysis in R using prcomp() and ggplot2, including scree plots, biplots, variance explained, and APA reporting.",
  "brand": {
    "@type": "Brand",
    "name": "Data Analysis"
  },
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.8",
    "bestRating": "5",
    "worstRating": "1",
    "ratingCount": "150"
  }
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org/",
  "@type": "Person",
  "name": "Zubair Goraya",
  "url": "https://zubairgoraya.rstudiodatalab.com/",
  "image": "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXfle0EkfKiRR_8FJF9PzmQi7KFzefg1cqMqkdjLNB8fmLyfS1XkuwSk0JGKM00WdzzLhzgHNP7xbgCW9QSIqUDoSUDiqK8Xz-GEpwiqOffmpE0LJDR92r6MVIMsRjhLwhUAK9F-DVddAqtZmzFZW3jwrOGXxX3ThTT0nGzX8x11bkIzc/s220/WhatsApp_Image_2022-07-16_at_7.55.30_PM-removebg-preview-removebg-preview.webp",
  "sameAs": [
    "https:www//facebook.com/zubeegoraya",
    "https://twitter.com/ZubairGoraya",
    "https://youtube.com/@data.03"
  ],
  "jobTitle": "PhD Scholar",
  "worksFor": {
    "@type": "Organization",
    "name": "Blogger"
  }  
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "data Analysis",
  "alternateName": "Statistical Data Analysis",
  "url": "https://www.rstudiodatalab.com/",
  "logo": "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXsinGH5wLHGxh8ar91Nq6n3eic-SPa_P_4-hTutwHHiCHU9OqqxyZjiEYP0NQXGxOxK45U3vh4fsbBoIRQdXZSu0d6si24k3fP1wGw6SWSRCZDMA0QxTZyFJfjtdhe8o9E1uP55zZj5xBBgnwW6BXovUu7l1Spr2nUQyMxZGPNNelenkHoyaxHy-_X-o/s150/Data%20Analysis.webp",
  "sameAs": [
    "https://www.facebook.com/DataAnalysis03",
    "https://twitter.com/Zubair01469079",
    "https://www.instagram.com/dataanalysis03/",
    "https://youtube.com/@data.03"
  ]
}
</script>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.rstudiodatalab.com/2023/09/principal-component-analysis-in-r-i-pca.html"> RStudioDataLab</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/pca-in-r-principal-component-analysis-step-by-step-prcomp-ggplot2/">PCA in R: Principal Component Analysis Step-by-Step (prcomp + ggplot2)</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402157</post-id>	</item>
		<item>
		<title>Celebrating Our Maintainers during Maintainers Month</title>
		<link>https://www.r-bloggers.com/2026/06/celebrating-our-maintainers-during-maintainers-month/</link>
		
		<dc:creator><![CDATA[rOpenSci]]></dc:creator>
		<pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://ropensci.org/blog/2026/06/19/maintainers-month/</guid>

					<description><![CDATA[<p>May was Open Source Software Maintainer Month.<br />
Behind every R package there is at least one person who responds to issues, reviews pull requests, keeps up with dependency changes, and makes sure everything still works.<br />
During Maintainer Month we wante...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/celebrating-our-maintainers-during-maintainers-month/">Celebrating Our Maintainers during Maintainers Month</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://ropensci.org/blog/2026/06/19/maintainers-month/"> rOpenSci - open tools for open science</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>May was Open Source Software Maintainer Month.
Behind every R package there is at least one person who responds to issues, reviews pull requests, keeps up with dependency changes, and makes sure everything still works.
During Maintainer Month we wanted to celebrate rOpenSci’s package maintainer community.</p>
<h2>
The social media campaign
</h2><p>One of our commitments to our community is to amplify the people who make it work. Social media is one of the ways we do that, so we thought Maintainer Month would be a great opportunity to highlight the people behind the packages through a social media campaign.</p>
<p>To run this campaign, we first needed permission from our maintainers to feature them. In our annual maintainer survey, we asked whether they would be interested in being featured in a public spotlight, and many said yes.</p>
<p>We also reached out to current and past Champions from our Champions Program, which trains and supports R developers from historically underrepresented groups in the open science community.</p>
<p>The result was a month-long series of spotlights: one maintainer at a time, each card sharing who they are, where they come from, and what they maintain.</p>
<figure><img src="https://i0.wp.com/ropensci.org/blog/2026/06/19/maintainers-month/examples_post.png?w=578&#038;ssl=1"
alt="First post in Mastodon announcing the maintainer month campaign, Ronald M. Visser and Maëlle Salmon post on LinkedIn" data-recalc-dims="1">
</figure>
<p>This campaign brought together 37 maintainers from 15 countries, maintaining more than 50 packages that together serve thousands of researchers and data practitioners around the world.</p>
<p>The diversity of this group reflects the diversity of the rOpenSci community: archaeologists, bioinformaticians, ecologists, economists, statisticians, sociologists, professors, PhD students, engineers and educators.</p>
<p>We created 39 posts on our accounts on LinkedIn and Mastodon, which is bridge to BlueSky. All the posts were shared by other people and organizations and received comments from grateful users.</p>
<h2>
Meet all 37 maintainers
</h2><p>Here is the full list of maintainers we celebrated in May.</p>
<figure><img src="https://i1.wp.com/ropensci.org/blog/2026/06/19/maintainers-month/Maintainermonth.png?w=578&#038;ssl=1"
alt="All 37 maintainers&#39; profile pictures inside hex geometrics form" data-recalc-dims="1">
</figure>
<ul>
<li>
<p><strong>Alex Koiter</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e8-1f1e6.png" alt="🇨🇦" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/mbquartR" rel="nofollow" target="_blank">{mbquartR}</a>, for working with Manitoba’s quarter-section land survey system in watershed and land management research.</p>
</li>
<li>
<p><strong>Andrea Gomez Vargas</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e8-1f1f4.png" alt="🇨🇴" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e6-1f1f7.png" alt="🇦🇷" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://soyandrea.github.io/arcenso/" rel="nofollow" target="_blank">{ARcenso}</a>, for accessing and analyzing Argentina’s national census data in R. Champions project.</p>
</li>
<li>
<p><strong>Austin Koontz</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1fa-1f1f8.png" alt="🇺🇸" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/SymbiotaR2" rel="nofollow" target="_blank">{SymbiotaR2}</a>, an R interface to the Symbiota platform for accessing and managing biodiversity occurrence data from natural history collections.</p>
</li>
<li>
<p><strong>Bilikisu Wunmi Olatunji</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1f3-1f1ec.png" alt="🇳🇬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://github.com/BWOlatunji/chartkickR" rel="nofollow" target="_blank">{chartkickR}</a>, an R wrapper for the Chartkick JavaScript library that makes it easy to create beautiful interactive charts and visualizations from R. Champions project.</p>
</li>
<li>
<p><strong>Carolina Pradier</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e6-1f1f7.png" alt="🇦🇷" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e8-1f1e6.png" alt="🇨🇦" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/eph" rel="nofollow" target="_blank">{eph}</a>, for downloading and analyzing microdata from Argentina’s Permanent Household Survey, supporting labour and socioeconomic research. Champions project.</p>
</li>
<li>
<p><strong>Daniel Vartanian</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e7-1f1f7.png" alt="🇧🇷" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/mctq" rel="nofollow" target="_blank">{mctq}</a>, for processing data from the Munich Chronotype Questionnaire in sleep and chronobiology research.</p>
</li>
<li>
<p><strong>Erick Navarro Delgado</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e8.png" alt="🇨" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e6-1f1f2.png" alt="🇦🇲" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1fd.png" alt="🇽" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://ericknavarrod.github.io/RAMEN/" rel="nofollow" target="_blank">{RAMEN}</a>, for identifying associations between environmental exposures and molecular outcomes in multi-omics research.
Champions project.</p>
</li>
<li>
<p><strong>Erika Siregar</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1ee.png" alt="🇮" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e9-1f1ec.png" alt="🇩🇬" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e7.png" alt="🇧" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://erikaris.github.io/rplaywright/" rel="nofollow" target="_blank">{rplaywright}</a>, an R interface to Microsoft Playwright for browser automation and web testing. Champions project.</p>
</li>
<li>
<p><strong>Ezekiel Adebayo Ogundepo</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1f3-1f1ec.png" alt="🇳🇬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://gbganalyst.github.io/bulkreadr/" rel="nofollow" target="_blank">{bulkreadr}</a>, for simplifying the bulk import of multiple files into R across a range of formats. Champions project.</p>
</li>
<li>
<p><strong>Francesca Palmeira</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e7-1f1f7.png" alt="🇧🇷" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://fblpalmeira.github.io/pcir/" rel="nofollow" target="_blank">{pcir}</a>, for modeling species interaction data and food web structures in conservation research. Champions project.</p>
</li>
<li>
<p><strong>Guadalupe Pascal</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e6-1f1f7.png" alt="🇦🇷" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/matildaNLP" rel="nofollow" target="_blank">{matildaNLP}</a>, a package with a specialized corpus of Spanish texts from the Matilda initiative to support research on gender-aware language processing and policy. Champions project.</p>
</li>
<li>
<p><strong>Haydée Svab</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e7-1f1f7.png" alt="🇧🇷" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://hsvab.github.io/odbr/" rel="nofollow" target="_blank">{odbr}</a>, for accessing open data urban mobility from a series of cities in Brazil. Champions project.</p>
</li>
<li>
<p><strong>Jeroen Ooms</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1f3-1f1f1.png" alt="🇳🇱" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/magick" rel="nofollow" target="_blank">{magick}</a>, <a href="https://docs.ropensci.org/pdftools" rel="nofollow" target="_blank">{pdftools}</a>, and <a href="https://docs.ropensci.org/gert" rel="nofollow" target="_blank">{gert}</a>, packages for image processing, PDF manipulation, and Git operations in R.</p>
</li>
<li>
<p><strong>Jonathan Keane</strong> maintains <a href="https://docs.ropensci.org/dittodb" rel="nofollow" target="_blank">{dittodb}</a>, which makes testing database-backed code easy by recording and replaying real database interactions so tests can run without a live connection.</p>
</li>
<li>
<p><strong>Julia Silge</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1fa-1f1f8.png" alt="🇺🇸" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/qualtRics" rel="nofollow" target="_blank">{qualtRics}</a>, for importing survey data from the Qualtrics platform directly into R.</p>
</li>
<li>
<p><strong>Karl Broman</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1fa-1f1f8.png" alt="🇺🇸" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/chromer" rel="nofollow" target="_blank">{chromer}</a> and <a href="https://docs.ropensci.org/aRxiv" rel="nofollow" target="_blank">{aRxiv}</a>, for accessing chromosome data and the arXiv preprint server.</p>
</li>
<li>
<p><strong>Maëlle Salmon</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1eb-1f1f7.png" alt="🇫🇷" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/saperlipopette" rel="nofollow" target="_blank">{saperlipopette}</a>, <a href="https://docs.ropensci.org/babelquarto" rel="nofollow" target="_blank">{babelquarto}</a>, and <a href="https://docs.ropensci.org/babeldown" rel="nofollow" target="_blank">{babeldown}</a>, tools for learn how to use git, create multilingual Quarto documents, and support translations workflows.</p>
</li>
<li>
<p><strong>Marcelo S. Perlin</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e7-1f1f7.png" alt="🇧🇷" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/yfR" rel="nofollow" target="_blank">{yfR}</a>, for importing financial data from Yahoo Finance into R.</p>
</li>
<li>
<p><strong>Marcos Prunello</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e6-1f1f7.png" alt="🇦🇷" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/karel/" rel="nofollow" target="_blank">{karel}</a>, a package that brings the Karel the Robot programming environment to R, designed to teach programming concepts and computational thinking to beginners. Champions project.</p>
</li>
<li>
<p><strong>Mark Padgham</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e9-1f1ea.png" alt="🇩🇪" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/pkgcheck" rel="nofollow" target="_blank">{pkgcheck}</a>, which automates software checks for packages submitted to rOpenSci peer review.</p>
</li>
<li>
<p><strong>Mauro Loprete</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1fa-1f1fe.png" alt="🇺🇾" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://metasurveyr.github.io/metasurvey/" rel="nofollow" target="_blank">{metasurvey}</a>, for processing and analyzing household survey microdata using a metadata-driven approach. Champions project.</p>
</li>
<li>
<p><strong>Micha Silver</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1ee-1f1f1.png" alt="🇮🇱" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/rOPTRAM" rel="nofollow" target="_blank">{rOPTRAM}</a>, implementing the OPtical TRApezoid Model for estimating soil moisture from satellite imagery.</p>
</li>
<li>
<p><strong>Moritz Hennicke</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e7-1f1ea.png" alt="🇧🇪" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/nuts" rel="nofollow" target="_blank">{nuts}</a>, for working with the EU’s Nomenclature of Territorial Units for Statistics, useful in regional economics and policy research.</p>
</li>
<li>
<p><strong>Pao Corrales</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e6-1f1f7.png" alt="🇦🇷" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e6-1f1fa.png" alt="🇦🇺" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/agroclimatico" rel="nofollow" target="_blank">{agroclimatico}</a>, for calculating agroclimatic indices and bioclimatic variables for agricultural and environmental research. Champions project.</p>
</li>
<li>
<p><strong>Peter Desmet</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e7-1f1ea.png" alt="🇧🇪" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/frictionless" rel="nofollow" target="_blank">{frictionless}</a>, for working with open data standards and publishing datasets.</p>
</li>
<li>
<p><strong>Philippe Massicotte</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e8-1f1e6.png" alt="🇨🇦" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/rnaturalearth" rel="nofollow" target="_blank">{rnaturalearth}</a>, <a href="https://docs.ropensci.org/rnaturalearthdata" rel="nofollow" target="_blank">{rnaturalearthdata}</a>, and <a href="https://docs.ropensci.org/gitignore" rel="nofollow" target="_blank">{gitignore}</a>, for working with natural earth map data and project utilities.</p>
</li>
<li>
<p><strong>Sam Albers</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e8-1f1e6.png" alt="🇨🇦" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/tidyhydat" rel="nofollow" target="_blank">{tidyhydat}</a>, for accessing Canadian hydrometric data in a tidy format.</p>
</li>
<li>
<p><strong>Steffi Lazerte</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e8-1f1e6.png" alt="🇨🇦" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/weathercan" rel="nofollow" target="_blank">{weathercan}</a>, for downloading Canadian weather data directly from Environment and Climate Change Canada.</p>
</li>
<li>
<p><strong>Tad Dallas</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1fa-1f1f8.png" alt="🇺🇸" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/helminthR" rel="nofollow" target="_blank">{helminthR}</a>, for accessing the London Natural History Museum’s host-parasite database.</p>
</li>
<li>
<p><strong>Ronald M. Visser</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1f3-1f1f1.png" alt="🇳🇱" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/dendroNetwork" rel="nofollow" target="_blank">{dendroNetwork}</a>, for creating and analyzing networks in dendrochronological research, combining archaeology and data science.</p>
</li>
<li>
<p><strong>Sehrish Kanwal</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e6-1f1fa.png" alt="🇦🇺" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://umccr.github.io/RNAsum/" rel="nofollow" target="_blank">{RNAsum}</a>, for summarising and visualising RNA-seq data analysis results in clinical cancer genomics workflows. Champions project.</p>
</li>
<li>
<p><strong>Victor Ordu</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1f3.png" alt="🇳" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1ec-1f1fa.png" alt="🇬🇺" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1f8.png" alt="🇸" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/naijR" rel="nofollow" target="_blank">{naijR}</a>, a package of tools and utilities for working with data and maps about Nigeria. Champions project.</p>
</li>
<li>
<p><strong>Will Gearty</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1fa-1f1f8.png" alt="🇺🇸" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/rredlist" rel="nofollow" target="_blank">{rredlist}</a>, for accessing IUCN Red List data on threatened species.</p>
</li>
<li>
<p><strong>Will Landau</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1fa-1f1f8.png" alt="🇺🇸" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/targets" rel="nofollow" target="_blank">{targets}</a>, a pipeline toolkit that makes data analysis in R faster and fully reproducible by tracking dependencies and only re-running what has changed.</p>
</li>
<li>
<p><strong>Will Pearse</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1ec-1f1e7.png" alt="🇬🇧" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/suppdata" rel="nofollow" target="_blank">{suppdata}</a>, for downloading supplementary data files directly from published scientific articles across major journals.</p>
</li>
<li>
<p><strong>Yi-Chin Sunny Tseng</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e8.png" alt="🇨" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1e6-1f1f9.png" alt="🇦🇹" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1fc.png" alt="🇼" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://sunnytseng.github.io/bbsTaiwan/" rel="nofollow" target="_blank">{bbsTaiwan}</a>, for accessing and analyzing data from Taiwan’s Breeding Bird Survey. Champions project.</p>
</li>
<li>
<p><strong>Zhian Kamvar</strong> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f1fa-1f1f8.png" alt="🇺🇸" class="wp-smiley" style="height: 1em; max-height: 1em;" /> maintains <a href="https://docs.ropensci.org/tinkr" rel="nofollow" target="_blank">{tinkr}</a>, for reading and writing Markdown documents in R as XML.</p>
</li>
</ul>
<h2>
Thank you Maintainers!
</h2><p>Maintaining open source software is an act of generosity. It takes time that could be spent elsewhere, and it often goes unacknowledged.
Every bug fix, every answered issue, every new feature and update is a small gift to the people who depend on that package.</p>
<p>We are grateful to all the rOpenSci maintainers.
If you use any of these packages, consider saying <em>thank you</em>.
You can also let us know how you use these packages by <a href="https://github.com/orgs/ropensci/discussions" rel="nofollow" target="_blank">sharing your use case</a>, that we will <a href="https://ropensci.org/usecases/" rel="nofollow" target="_blank">feature in our website</a>.</p>
<p>Want to learn more? Explore the <a href="https://ropensci.org/packages" rel="nofollow" target="_blank">rOpenSci’s packages</a> in our website and check all the other <a href="https://r-universe.dev/search" rel="nofollow" target="_blank">packages universes</a> in R-Universe.</p>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://ropensci.org/blog/2026/06/19/maintainers-month/"> rOpenSci - open tools for open science</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/celebrating-our-maintainers-during-maintainers-month/">Celebrating Our Maintainers during Maintainers Month</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402145</post-id>	</item>
		<item>
		<title>EuroBioC2026 conference recap</title>
		<link>https://www.r-bloggers.com/2026/06/eurobioc2026-conference-recap/</link>
		
		<dc:creator><![CDATA[Laurah Ondari]]></dc:creator>
		<pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>The European Bioconductor Conference 2026 (EuroBioC2026) took place from June 3-5, 2026, in Turku, Finland. Hosted by the University of Turku and the Finnish Society for Bioinformatics at BioCity, the conference brought together the Bioconducto...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/eurobioc2026-conference-recap/">EuroBioC2026 conference recap</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/"> Bioconductor community blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<p><a href="https://i2.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/eurobioc-main-image.jpg?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-1" rel="nofollow" target="_blank"><img src="https://i2.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/eurobioc-main-image.jpg?w=578&#038;ssl=1" class="zoomable img-fluid" style="width:100.0%" data-recalc-dims="1"></a></p>
<p>The European Bioconductor Conference 2026 (<a href="https://eurobioc2026.bioconductor.org/" rel="nofollow" target="_blank">EuroBioC2026</a>) took place from June 3-5, 2026, in Turku, Finland. Hosted by the <a href="https://www.utu.fi/en" rel="nofollow" target="_blank">University of Turku</a> and the <a href="https://www.bioinf.fi/" rel="nofollow" target="_blank">Finnish Society for Bioinformatics</a> at BioCity, the conference brought together the Bioconductor community to showcase the latest developments in Bioconductor software packages and discuss emerging technologies shaping computational biology. This year’s conference welcomed 147 in-person participants from 23 countries. Across three days, attendees participated in keynote lectures, short and flash talks, workshops, poster sessions, Birds-of-a-Feather discussions, and community events. The conference also marked an important milestone for the project as Bioconductor celebrated its 25th anniversary. The figures below summarise EuroBioC2026 at a glance: 147 attendees from 23 countries, 4 keynote speakers, 25 speakers, 68 posters, 6 workshops, 9 flash talks, and 3 Birds-of-a-Feather sessions.</p>
<p><a href="https://i2.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/eurobioc-by-numbers.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-2" rel="nofollow" target="_blank"><img src="https://i2.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/eurobioc-by-numbers.png?w=578&#038;ssl=1" class="zoomable img-fluid" style="width:100.0%" data-recalc-dims="1"></a></p>
<section id="participants-by-country" class="level2">
<h2 class="anchored" data-anchor-id="participants-by-country">Participants by country</h2>
<p>Participants travelled to Turku from across Europe and beyond, reflecting the increasingly global nature of the Bioconductor community. While Finland represented the largest delegation, attendees also joined from Italy, Belgium, Germany, Switzerland, the United States, Sweden, the United Kingdom, Ireland, Spain, Kenya, South Korea, Australia, and several other countries.</p>
<iframe src="https://blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/eurobioc2026-participants-map.html" width="450" height="600" frameborder="0">
</iframe>
</section>
<section id="preconference" class="level2">
<h2 class="anchored" data-anchor-id="preconference">Preconference</h2>
<p>Ahead of the main conference, EuroBioC2026 hosted two preconference events on June 1-2. These were delivered in collaboration with the University of Turku, CompLifeSci, the Finnish Society for Bioinformatics, and members of the Bioconductor community. Running in parallel over two days, the events allowed participants to either strengthen their analytical skills through hands-on training or contribute directly to the development of Bioconductor software through collaborative coding projects.</p>
<section id="workshop-orchestrating-microbiome-analysis-with-bioconductor" class="level3">
<h3 class="anchored" data-anchor-id="workshop-orchestrating-microbiome-analysis-with-bioconductor">Workshop: Orchestrating Microbiome Analysis with Bioconductor</h3>
<p>The preconference workshop focused on microbiome data analysis using Bioconductor and followed the Bioconductor Carpentry model, combining interactive instruction with practical exercises. Over two days, participants learned how to import, process, and analyse microbiome datasets using the established Bioconductor workflow; <a href="https://microbiome.github.io/OMA/docs/devel/" rel="nofollow" target="_blank">Orchestrating Microbiome Analysis (OMA)</a>. The workshop covered diversity analyses, differential abundance testing, and approaches for integrating microbiome data with other omics data types. A key component of the workshop was the use of cloud computing resources from <a href="https://csc.fi/en/" rel="nofollow" target="_blank">CSC</a> (Finnish IT Centre for Science) using <a href="https://noppe.2.rahtiapp.fi/welcome" rel="nofollow" target="_blank">Noppe</a>, which provided participants with immediate access to all required datasets, software, and computing resources. By removing installation and configuration barriers, instructors were able to begin teaching immediately and spend more time focusing on the workshop content rather than troubleshooting technical issues. The platform also ensured that all participants worked within the same environment, creating a smoother learning experience for everyone involved. The workshop was hands-on throughout: participants asked questions, worked through exercises, and discussed how the workflows related to their own projects. This practical, open, instructor-led format aligns well with the approach shared by both Bioconductor and The Carpentries. The workshop was led by Leo Lahti, Himel Mallick, Thomaz Bastiaanssen, Tuomas Borman, and Giulio Benedetti.</p>
<img src="https://i0.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/benedetti-workshop.jpg?w=578&#038;ssl=1" class="zoomable img-fluid" style="width:100.0%" data-recalc-dims="1">
<center>
<p><em>Participants during the pre-conference microbiome workshop.</em></p>
</center>
</section>
<section id="hackathon" class="level3">
<h3 class="anchored" data-anchor-id="hackathon">Hackathon</h3>
<p>We held the first of our series of hackathons attached to Bioconductor conferences this June at EuroBioC2026 in Turku, Finland. Eighteen in-person attendees worked on four projects focused on interoperability, and at least three of those are now being prepared for submission to <a href="https://index.biohackrxiv.org/tag/EuroBioc2026" rel="nofollow" target="_blank">BioHackrXiv</a>.</p>
<p>A big congratulations to all the participants for their effort. You can read more about the projects from the <a href="https://github.com/BiocCodingCollaborations/EuroBioc2026_Hackathon" rel="nofollow" target="_blank">EuroBioC2026 Hackathon</a>. We’re looking forward to building on this work at the North American Bioconductor conference, BioC2026, in Seattle this August. See the <a href="https://github.com/BiocCodingCollaborations/BiocNA2026_Hackathon" rel="nofollow" target="_blank">BioC2026 Hackathon</a> for more details.</p>
<img src="https://i1.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/hackathon.jpg?w=578&#038;ssl=1" class="zoomable img-fluid" style="width:100.0%" data-recalc-dims="1">
<center>
<p><em>Participants during the pre-conference hackathon.</em></p>
</center>
</section>
</section>
<section id="programme-overview" class="level2">
<h2 class="anchored" data-anchor-id="programme-overview">Programme overview</h2>
<p>EuroBioC2026 covered both established and emerging areas of computational biology, with a consistent focus on reproducible and open-source research.</p>
<section id="keynotes" class="level3">
<h3 class="anchored" data-anchor-id="keynotes">Keynotes</h3>
<p>Keynotes at EuroBioC2026 covered functional genomics, machine learning, microbiome research, and the direction of computational biology. Across four talks, speakers addressed how data science is changing biological research, and what that means for reproducibility, interpretation, and open-source software.</p>
<p><strong>Helena Kilpinen</strong> opened the conference with <em>Morphological profiling of in vitro neurons: Visualizing complexity in cellular disease models</em>. Her talk explored how high-content imaging and morphological profiling can be used to better understand cellular phenotypes in disease models. By combining large-scale imaging data with computational approaches, she demonstrated how researchers can uncover subtle cellular differences that may provide insights into disease mechanisms.</p>
<p><strong>Anders Krogh</strong> presented <em>A Deep Generative Model for Gene Expression and Multimodal Data</em>, showcasing how modern machine learning approaches can be used to model increasingly complex biological datasets. His keynote highlighted the potential of generative models to integrate multiple data modalities and improve our understanding of gene regulation and cellular states.</p>
<p><strong>Aura Raulo</strong> delivered a keynote titled <em>Modeling the spread of microbial communities in contact networks</em>. Drawing on concepts from ecology, microbiology, and network science, they explored how microbial communities are transmitted between individuals and populations. The talk examined how host-associated microbiomes are shaped and shared, and what drives their spread.</p>
<p>The final keynote was delivered by <strong>Levi Waldron</strong>, who addressed a topic now central to many scientific discussions: <em>Bioconductor in the age of AI. What do we do now?</em> His talk examined the opportunities and challenges that AI presents for open-source scientific software. He encouraged the community to think about how AI tools can complement existing work, without compromising the transparency, reproducibility, and scientific rigour the project has built over 25 years.</p>
</section>
<section id="short-talks-and-flash-talks" class="level3">
<h3 class="anchored" data-anchor-id="short-talks-and-flash-talks">Short talks and flash talks</h3>
<p>The short talks at EuroBioC2026 reflected the diversity of the Bioconductor community, spanning topics from single-cell and spatial biology to microbiome research, proteomics, metabolomics, and multi-omics data integration. Several presentations introduced new software packages and statistical methods aimed at improving reproducibility, scalability, and interoperability in biological data analysis. Alongside methodological advances, speakers also covered broader community topics, including sustainable open-source software, environmentally conscious computing, training initiatives, and the growing role of artificial intelligence in computational biology. Together, the talks provided a good picture of the scientific questions being addressed with Bioconductor and the people driving its development.</p>
</section>
<section id="poster-sessions" class="level3">
<h3 class="anchored" data-anchor-id="poster-sessions">Poster sessions</h3>
<p>The 68 posters presented at EuroBioC2026 covered a broad mix of biological applications and software development. Topics included spatial omics, microbiome research, proteomics, metabolomics, disease modelling, and machine learning, as well as new packages and infrastructure projects from across the Bioconductor ecosystem. The poster sessions encouraged interactions between package developers, researchers, students, and first-time conference attendees, helping strengthen collaborations across the community.</p>
<img src="https://i0.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/poster-session.jpg?w=578&#038;ssl=1" class="zoomable img-fluid" style="width:100.0%" data-recalc-dims="1">
<center>
<p><em>EuroBioC2026 participants during a poster session.</em></p>
</center>
</section>
<section id="birds-of-a-feather-sessions" class="level3">
<h3 class="anchored" data-anchor-id="birds-of-a-feather-sessions">Birds-of-a-Feather sessions</h3>
<p>The three 90-minute Birds-of-a-Feather (BoF) sessions offered attendees an opportunity to connect around shared interests and exchange experiences, discuss challenges, and share ideas. The sessions were proposed by participants during the conference including sessions focused on strengthening the Finnish Bioconductor community, supporting early-career researchers, and embedding environmental sustainability into Bioconductor packages and research workflows. One outcome from the early-career researcher discussion was the creation of a dedicated student–ECR Zulip channel to support continued connection within the community. The BoF sessions continued a tradition of community-led discussion that has been part of Bioconductor events for years.</p>
</section>
<section id="workshops" class="level3">
<h3 class="anchored" data-anchor-id="workshops">Workshops</h3>
<p>The workshop sessions offered attendees an opportunity to explore a range of Bioconductor tools and workflows through hands-on demonstrations led by community members. Topics included proteomics data analysis, integrative analysis of histopathological images and multi-omics data, ChIP-seq analysis, differential expression analysis, post-translational modification analysis, and interoperable mass spectrometry workflows combining R and Python. Participants had the opportunity to engage directly with instructors, ask questions, and learn how the presented tools could be applied to their own research projects. Together, the workshops showcased the breadth of analytical domains supported by the Bioconductor ecosystem.</p>
</section>
<section id="celebrating-25-years-of-bioconductor" class="level3">
<h3 class="anchored" data-anchor-id="celebrating-25-years-of-bioconductor">Celebrating 25 years of Bioconductor</h3>
<p>A major highlight of EuroBioC2026 was the celebration of Bioconductor’s 25th anniversary. Since its founding in 2001, Bioconductor has grown from a small collection of software packages into a global open-source community used by thousands of researchers worldwide. Over the past quarter-century, it has become a central resource for reproducible computational biology, providing infrastructure, software, training, and community support across numerous biological disciplines.</p>
<p><a href="https://blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/bioc25years.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-3" rel="nofollow" target="_blank"><img src="https://blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/bioc25years.svg" class="zoomable img-fluid" style="width:100.0%"></a></p>
<p>One of the highlights of the celebration was a retrospective presented by Maria Doyle, Bioconductor Community Manager, who took attendees through the history of Bioconductor, from the earliest contribution on the Bioconductor support site to the project’s growth into the global community it is today. The presentation highlighted how the project has evolved over the past 25 years and its impact on computational biology. The celebrations continued at the conference dinner, where attendees marked the occasion with a special anniversary cake. During the evening, Levi Waldron, one of Bioconductor’s Principal Investigators, shared a personal reflection on his journey with Bioconductor, from first encountering the project through his collaborations with Martin Morgan to becoming part of its leadership.</p>
<img src="https://i2.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/levispeech.jpg?w=578&#038;ssl=1" class="zoomable img-fluid" style="width:100.0%" data-recalc-dims="1">
<center>
<p><em>Levi Waldron shares a personal reflection on his journey with Bioconductor during the 25th anniversary celebrations.</em></p>
</center>
</section>
</section>
<section id="infrastructure-and-tools" class="level2">
<h2 class="anchored" data-anchor-id="infrastructure-and-tools">Infrastructure and tools</h2>
<section id="zulip" class="level3">
<h3 class="anchored" data-anchor-id="zulip">Zulip</h3>
<p>EuroBioC2026 continued to use Zulip as its primary communication platform. A dedicated conference channel, along with a separate hackathon channel, organised into topic-based threads, served as a central location for announcements, technical support, social interactions, and discussions before, during, and after the event. The threaded conversation model made it easier to follow discussions and kept participants connected throughout the conference.</p>
</section>
<section id="sticker-hexwall" class="level3">
<h3 class="anchored" data-anchor-id="sticker-hexwall">Sticker Hexwall</h3>
<p>The sticker hexwall returned for EuroBioC2026 following its successful introduction in 2025. The display showcased Bioconductor package stickers contributed by Bioconductor community members and served as a visual representation of the diversity of software projects within the ecosystem.</p>
<p>The hexwall quickly became a popular gathering point and photo location throughout the conference.</p>
<img src="https://i1.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/hexwall.jpg?w=578&#038;ssl=1" class="zoomable img-fluid" style="width:100.0%" data-recalc-dims="1">
<center>
<p><em>The hexwall at EuroBioC2026.</em></p>
</center>
</section>
</section>
<section id="social-interactions-and-networking" class="level2">
<h2 class="anchored" data-anchor-id="social-interactions-and-networking">Social interactions and networking</h2>
<section id="conference-dinner" class="level3">
<h3 class="anchored" data-anchor-id="conference-dinner">Conference Dinner</h3>
<p>The conference dinner took place on the island of Ruissalo, one of Turku’s most popular recreational areas and the gateway to the Turku Archipelago. It was hosted at the historic Villa Marjaniemi, a 150-year-old villa overlooking the sea, and the evening was inspired by Juhannus, Finland’s traditional midsummer celebration.</p>
<p>Attendees were welcomed by a live band as they arrived, then enjoyed dinner and celebrations marking 25 years of Bioconductor. The evening continued with outdoor games and activities, and was a good chance to catch up with familiar faces and meet people for the first time.</p>
</section>
<section id="walking-tour" class="level3">
<h3 class="anchored" data-anchor-id="walking-tour">Walking tour</h3>
<p>On Thursday evening, participants joined an optional walking tour. During the tour, participants learned about Finnish history while exploring the historic city centre, stopping at the Old Great Square and Brinkkala Hall, whose balcony has served as the site of the annual (<a href="https://en.wikipedia.org/wiki/Christmas_Peace" rel="nofollow" target="_blank">Christmas Peace</a>) declaration since the Middle Ages. The tour also highlighted notable Finnish figures, including the legendary runner (<a href="https://en.wikipedia.org/wiki/Paavo_Nurmi" rel="nofollow" target="_blank">Paavo Nurmi</a>), famously known as the “Flying Finn.”</p>
<p>The tour naturally flowed into the evening’s social activities. Some participants stopped at a traditional Finnish grill kiosk to try makkaraperunat, a popular local fast-food dish, while others continued their conversations at Office (Toimisto in Finnish), a local bar where they sang karaoke until 3 AM.</p>
<div class="columns">
<div class="column" style="width:48%;">
<p><a href="https://i1.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/walking-tour.jpg?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-4" rel="nofollow" target="_blank"><img src="https://i1.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/walking-tour.jpg?w=578&#038;ssl=1" class="img-fluid" style="width:100.0%" data-recalc-dims="1"></a></p>
</div><div class="column" style="width:48%;">
<p><a href="https://i0.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/dinner.jpg?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-5" rel="nofollow" target="_blank"><img src="https://i0.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/dinner.jpg?w=578&#038;ssl=1" class="img-fluid" style="width:100.0%" data-recalc-dims="1"></a></p>
</div>
</div>
<center>
<p><em>EuroBioC2026 Participants during the walking tour (left) and enjoying the conference dinner (right).</em></p>
</center>
</section>
</section>
<section id="conference-materials" class="level2">
<h2 class="anchored" data-anchor-id="conference-materials">Conference materials</h2>
<p>Conference recordings will be available on the <a href="https://www.youtube.com/@bioconductor" rel="nofollow" target="_blank">Bioconductor YouTube channel</a> in the coming weeks. Auditorium sessions were also live streamed, and Slido was used to facilitate audience questions from both in-person and remote participants, alongside traditional in-room discussion. Presenters were encouraged to upload their slides, posters, and supplementary materials to the <a href="https://zenodo.org/communities/bioconductor" rel="nofollow" target="_blank">Bioconductor Zenodo Community</a>, making conference outputs openly available and citable through persistent digital object identifiers (DOIs).</p>
<p>These resources make conference outputs available to those who could not attend and support continued learning across the community. Additional photos from EuroBioC2026, including talks, workshops, posters, social events, and the conference dinner, are available in the <a href="https://eurobioc2026.bioconductor.org/pages/photo-gallery.html" rel="nofollow" target="_blank">conference photo gallery</a>. A short recap video capturing moments from the conference is also available <a href="https://youtube.com/shorts/PdMdkpTPMeM?si=Q7hu4AwztXEsZuZM" rel="nofollow" target="_blank">on YouTube</a>.</p>
</section>
<section id="coming-up" class="level2">
<h2 class="anchored" data-anchor-id="coming-up">Coming up…</h2>
<p>The 25th anniversary year will continue when the Bioconductor community gather next at (<a href="https://bioc2026.bioconductor.org/" rel="nofollow" target="_blank">BioC2026</a>), which will take place from August 10-12, 2026 at the Fred Hutch Cancer Center in Seattle, Washington. The conference will continue the tradition of bringing together developers, researchers, and educators to share new software, methods, and applications in computational biology.</p>
<p>Later in the year, the community will head to Melbourne, Australia, for (<a href="https://biocasia2026.bioconductor.org/" rel="nofollow" target="_blank">BioCAsia2026</a>), taking place on November 19-20, 2026, immediately following the ABACBS conference. BioCAsia brings together researchers across the Asia-Pacific region for scientific exchange, training, and community building. The (<a href="https://brisbanebioinformatics.org/event/qld-week-biocasia/" rel="nofollow" target="_blank">BioCAsia Seminar Series</a>) has also expanded to a bi-monthly schedule. alongside growing regional initiatives such as the (<a href="https://training.bioconductor.org/workshops/bioc-africa-seminars/" rel="nofollow" target="_blank">Bioconductor Africa Seminar Series</a>) and the Bioconductor Latin America seminar series. Stay connected with the community through dedicated Zulip channels.</p>
<p>EuroBioC2026 concluded with an invitation to Basel, Switzerland, where EuroBioC2027 will take place from September 8–10, 2027. See you there.</p>
</section>
<section id="acknowledgements" class="level2">
<h2 class="anchored" data-anchor-id="acknowledgements">Acknowledgements</h2>
<section id="sponsors" class="level3">
<h3 class="anchored" data-anchor-id="sponsors">Sponsors</h3>
<p>EuroBioC2026 gratefully acknowledges the support of all sponsors and partners whose contributions and support made the conference possible.</p>
<p><a href="https://i1.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/sponsors-partners.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-6" rel="nofollow" target="_blank"><img src="https://i1.wp.com/blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/sponsors-partners.png?w=578&#038;ssl=1" class="zoomable img-fluid" style="width:100.0%" data-recalc-dims="1"></a></p>
</section>
<section id="diamond-sponsors" class="level3">
<h3 class="anchored" data-anchor-id="diamond-sponsors">Diamond sponsors</h3>
<ul>
<li><a href="https://www.tsv.fi/en" rel="nofollow" target="_blank">Federation of Finnish Learned Societies</a></li>
<li><a href="https://skr.fi/en/" rel="nofollow" target="_blank">Finnish Cultural Foundation</a></li>
</ul>
</section>
<section id="gold-sponsors" class="level3">
<h3 class="anchored" data-anchor-id="gold-sponsors">Gold sponsors</h3>
<ul>
<li><a href="https://biocityturku.fi/" rel="nofollow" target="_blank">BioCity, Turku</a></li>
<li><a href="https://stiftelsenabo.fi/en/" rel="nofollow" target="_blank">Åbo Akademi University Foundation</a></li>
</ul>
</section>
<section id="bronze-sponsors" class="level3">
<h3 class="anchored" data-anchor-id="bronze-sponsors">Bronze sponsors</h3>
<ul>
<li><a href="https://bigomics.ch/" rel="nofollow" target="_blank">BigOmics Analytics</a></li>
<li><a href="https://www.physalia-courses.org/" rel="nofollow" target="_blank">Physalia Courses</a></li>
<li><a href="https://r-consortium.org/" rel="nofollow" target="_blank">R Consortium</a></li>
<li><a href="https://www.loimu.fi/en/" rel="nofollow" target="_blank">LOIMU</a></li>
<li><a href="https://liedonsaastopankkisaatio.fi/" rel="nofollow" target="_blank">Liedon Säästöpankkisäätiö</a></li>
</ul>
</section>
<section id="supporting-organisations" class="level3">
<h3 class="anchored" data-anchor-id="supporting-organisations">Supporting organisations</h3>
<ul>
<li><a href="https://csc.fi/en/" rel="nofollow" target="_blank">CSC &#8211; IT Center for Science, Finland</a> for providing computational resources for the workshops</li>
<li><a href="https://www.nordic-compbio.org/" rel="nofollow" target="_blank">Nordic Computational Biology</a></li>
</ul>
</section>
<section id="hosts" class="level3">
<h3 class="anchored" data-anchor-id="hosts">Hosts</h3>
<ul>
<li><a href="https://www.utu.fi/en" rel="nofollow" target="_blank">University of Turku</a></li>
<li><a href="https://biocityturku.fi/research-programs/complifesci/" rel="nofollow" target="_blank">CompLifeSci, BioCity Turku</a></li>
<li><a href="https://www.bioinf.fi/" rel="nofollow" target="_blank">Finnish Society for Bioinformatics</a></li>
</ul>
</section>
<section id="organising-committee" class="level3">
<h3 class="anchored" data-anchor-id="organising-committee">Organising committee</h3>
<p>We thank the local organisers, programme committee, workshop instructors, keynote speakers, volunteers, sponsors, and all participants whose contributions made EuroBioC2026 a success.</p>
<p><strong>Organising Committee</strong></p>
<ul>
<li>Leo Lahti (Chair)</li>
<li>Tuomas Borman (Local Chair)</li>
<li>Akewak Jeba (Website)</li>
<li>Anna Kaisanlahti (Local Organiser)</li>
<li>Annekathrin Nedwed</li>
<li>Charlotte Soneson (Scientific Programme)</li>
<li>Dania Machlab</li>
<li>Dario Righelli</li>
<li>Eliana Ibrahimi</li>
<li>Federico Marini</li>
<li>James Dalgleish</li>
<li>Julia Mathlin (Local Organiser)</li>
<li>Kevin Rue-Albrecht</li>
<li>Laurent Gatto</li>
<li>Lieven Clement</li>
<li>Maria Doyle (Communications)</li>
<li>Mark Robinson</li>
<li>Michael Love</li>
<li>Michael Stadler</li>
<li>Miina Vulli (Local Organiser)</li>
<li>Najla Abassi</li>
<li>Nicholas Cooley (Hackathon)</li>
<li>Nyasita Laurah Ondari (Communications)</li>
<li>Robert Castelo</li>
<li>Robert Ivánek</li>
<li>Teemu Daniel Laajala (Local Organiser)</li>
</ul>


</section>
</section>

<p>
© 2026 Bioconductor. Content is published under <a href="https://creativecommons.org/licenses/by/4.0/" rel="nofollow" target="_blank">Creative Commons CC-BY-4.0 License</a> for the text and <a href="https://opensource.org/licenses/BSD-3-Clause" rel="nofollow" target="_blank">BSD 3-Clause License</a> for any code. | <a href="https://www.r-bloggers.com/" rel="nofollow" target="_blank">R-Bloggers</a>
</p> 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/"> Bioconductor community blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/eurobioc2026-conference-recap/">EuroBioC2026 conference recap</a>]]></content:encoded>
					
		
		<enclosure url="https://blog.bioconductor.org/posts/2026-06-19-EuroBioc2026-recap/media/eurobioc-main-image.jpg" length="0" type="image/jpeg" />

		<post-id xmlns="com-wordpress:feed-additions:1">402139</post-id>	</item>
		<item>
		<title>Mistral AI : Le Mirage du SaaS et le Grand Pivot vers le Réel</title>
		<link>https://www.r-bloggers.com/2026/06/mistral-ai-le-mirage-du-saas-et-le-grand-pivot-vers-le-reel/</link>
		
		<dc:creator><![CDATA[Boris Guarisma]]></dc:creator>
		<pubDate>Thu, 18 Jun 2026 10:19:17 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://blog.bguarisma.com/mistral-ai-le-mirage-du-saas-et-le-grand-pivot-vers-le-reel</guid>

					<description><![CDATA[<p>Le 16 juin 2026, Arthur Mensch, le patron emblématique de Mistral AI, a publié sur LinkedIn un message d’une rare solennité. Le discours officiel y célèbre, comme toujours, l’indépendance technologiqu</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/mistral-ai-le-mirage-du-saas-et-le-grand-pivot-vers-le-reel/">Mistral AI : Le Mirage du SaaS et le Grand Pivot vers le Réel</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://blog.bguarisma.com/mistral-ai-le-mirage-du-saas-et-le-grand-pivot-vers-le-reel"> Foundations</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>Le 16 juin 2026, Arthur Mensch, le patron emblématique de Mistral AI, a publié sur LinkedIn un message d’une rare solennité. Le discours officiel y célèbre, comme toujours, l’indépendance technologique européenne, la « souveraineté » et la mise à disposition imminente de nouveaux poids ouverts pour l’été.</p>
<p>Pourtant, sous le vernis des relations publiques et de l&#8217;alignement politique national, se glisse un aveu de taille :</p>
<blockquote>
<p><em>« Aujourd’hui, nous ne possédons pas encore les meilleurs modèles de langage, mais nous avons constamment réduit cet écart. »</em></p>
</blockquote>
<p>Cette phrase est une clé de lecture fondamentale. Elle signe la fin de l&#8217;illusion. L’écart technologique sur le raisonnement général pur face aux géants américains ne se comble pas par la simple élégance de l&#8217;algorithme ; il se creuse sous le poids des gigawatts de serveurs et des milliards de dollars de capitalisation. Pour survivre face à la Silicon Valley, le champion européen de l&#8217;intelligence artificielle vient d&#8217;opérer un pivot stratégique discret, mais d&#8217;une envergure historique : <strong>la Palantirisation</strong>.</p>
<p>En choisissant de dépêcher des troupes d’ingénieurs d’élite directement chez ses clients pour faire plier ses modèles à leurs processus métiers complexes, Mistral AI abandonne discrètement le rêve de l’éditeur logiciel pur à marges infinies pour embrasser la réalité humaine, physique et réglementaire de l&#8217;économie européenne.</p>
<hr />
<h2>1. La capitulation du « Pure Software »</h2>
<p>Pour mesurer l&#8217;ampleur de cette mue, il faut se souvenir de la promesse de départ. En juin 2023, la note stratégique d&#8217;amorçage de Mistral AI — un document fondateur de sept pages rédigé par des transfuges de Meta et de Google DeepMind — esquissait un modèle d’affaires à faire saliver les plus grands fonds de capital-risque de la planète. La startup promettait une stratégie « Open Core » articulée autour d’un effet de réseau gratuit (des modèles ouverts téléchargeables par tous) redirigeant vers leur plateforme payante d’API logicielles.</p>
<p>C’était la promesse du modèle SaaS classique appliqué à l&#8217;ère cognitive : un coût marginal d’inférence décroissant, des équipes ultra-légères, pas d&#8217;infrastructure physique à gérer, et des marges brutes de près de 80 %. C’est sur cette thèse de scalabilité pure que les investisseurs ont valorisé Mistral à hauteur de 12 milliards de dollars en moins de trois ans.</p>
<p>Mais l&#8217;IA en entreprise a une propriété physique tenace : <strong>elle ne s&#8217;auto-installe pas</strong>.</p>
<p>Brancher un modèle de langage générique sur un système informatique d&#8217;entreprise hérité des années 1990 (<em>legacy</em>), s&#8217;assurer que les données ne fuitent pas chez un sous-traitant cloud, nettoyer les bases de données sémantiques et orchestrer des flottes d&#8217;agents fiables demande un effort d&#8217;ingénierie humaine titanesque. L&#8217;IA n&#8217;est pas un logiciel fluide ; c&#8217;est une infrastructure lourde.</p>
<p>La réponse d&#8217;Arthur Mensch ? L&#8217;introduction officielle du modèle <strong>FDE</strong> :</p>
<blockquote>
<p><em>« Nous les aidons avec des pro-services (FDE est le nom chic), car c’est essentiel pour assurer la réussite de nos clients. »</em></p>
</blockquote>
<p>L&#8217;acronyme <strong>FDE</strong> — <em>Forward Deployed Engineers</em> — est le secret de fabrication jalousement gardé d&#8217;un autre géant de la tech américaine : <strong>Palantir Technologies</strong>. Ce modèle consiste à détacher des ingénieurs d’élite directement dans les bureaux du client (que ce soit une banque d&#8217;affaires, un constructeur aéronautique ou un ministère de la Défense) pour câbler le logiciel aux sources de données réelles.</p>
<p>Pour Palantir, ce modèle d’affaires a longtemps été boudé par Wall Street, qui y voyait une vulgaire activité de « société de services » à forte intensité de main-d’œuvre et à faible scalabilité. Mais aujourd&#8217;hui, c’est le seul qui fonctionne en Europe. En adoptant les FDE, Mistral admet que pour vendre de l&#8217;IA sur le Vieux Continent, il ne suffit pas de mettre à disposition une clé d&#8217;API. Il faut envoyer des hommes pour construire l&#8217;ouvrage d&#8217;art.</p>
<hr />
<h2>2. Le poids de la matière et la revente de kilowattheures</h2>
<p>Le second axe de cette transformation est d&#8217;ordre physique et financier. Au cours du premier trimestre 2026, Mistral AI a contracté un emprunt d&#8217;un montant inédit de <strong>830 millions de dollars auprès de la BNP Paribas</strong>. Le collatéral de cette dette ? Pas des actions, mais de la matière brute : <strong>13 800 puces de calcul de dernière génération (les GPU Nvidia GB300)</strong>.</p>
<p>Pour une startup logicielle, se charger d&#8217;une telle dette d&#8217;infrastructure est un pari extrêmement risqué. Si les puces dorment ou si le prix d&#8217;inférence mondial s&#8217;effondre sous le coup du dumping des tokens des géants de l&#8217;open-source (comme le chinois DeepSeek), l&#8217;entreprise risque l&#8217;asphyxie financière.</p>
<p>Pour honorer ses créances et générer du cash-flow immédiat, Mistral a dû descendre dans la cave. Elle est passée de créatrice de modèles à <strong>fournisseur d&#8217;électricité cognitive</strong> :</p>
<blockquote>
<p><em>« Suite à notre investissement dans l’infrastructure de calcul, nous proposons également des services cloud d’IA hébergés. »</em></p>
</blockquote>
<p>L&#8217;entreprise utilise désormais ses serveurs excédentaires pour faire de la revente de puissance de calcul pure (<em>GPU-as-a-Service</em>), louant ses puces à de grands groupes industriels européens comme ASML, Ericsson ou l&#8217;Agence Spatiale Européenne pour qu&#8217;ils y pré-entraînent leurs propres réseaux via sa plateforme de calcul <strong>Forge</strong>.</p>
<p>C&#8217;est une intégration verticale inversée remarquable. Pour financer ses modèles de pointe, Mistral doit se comporter comme un gestionnaire de réseau électrique, vendant du kilowattheure de calcul en même temps que du token d&#8217;intelligence.</p>
<hr />
<h2>3. La souveraineté d&#8217;orchestration : l&#8217;interdépendance du silicium</h2>
<p>Le discours d&#8217;Arthur Mensch insiste lourdement sur un niveau de service et de sécurité <em>« totalement découplé des fournisseurs américains »</em>. C&#8217;est le cœur du positionnement marketing de la startup : offrir aux acteurs de l&#8217;industrie, de la santé et de l&#8217;État un environnement de confiance face à l&#8217;hégémonie de Microsoft et de Google.</p>
<p>Pour certains observateurs, ce grand écart entre un discours d&#8217;indépendance et la réalité matérielle (des puces Nvidia GB300 conçues en Californie et gravées à Taïwan par TSMC, une structure de capital à forte composante anglo-saxonne) relève de la contradiction.</p>
<p>C&#8217;est oublier une vérité fondamentale que la physique et la géopolitique des semi-conducteurs nous imposent : <strong>dans l&#8217;industrie de pointe, l&#8217;autarcie à 100 % est une chimère</strong>.</p>
<p>La fabrication d&#8217;une seule puce de calcul IA est le produit d&#8217;une interdépendance radicale et d&#8217;une symbiose de monopoles nationaux. Les États-Unis conçoivent le design logiciel, le Japon détient la chimie de pointe et le silicium ultra-pur, l&#8217;Allemagne produit les optiques de Zeiss et les lasers de Trumpf, et les Pays-Bas, via le monopole absolu d&#8217;ASML, assemblent les machines de lithographie EUV. Et même ces machines néerlandaises dépendent de brevets et de technologies d&#8217;origine américaine (comme les sources de lumière Cymer).</p>
<p>Dans ce réseau à clés d&#8217;accès dispersées, exiger de l&#8217;Europe ou de Mistral qu&#8217;ils possèdent la chaîne de bout en bout est un contresens. La souveraineté réelle ne réside pas dans l&#8217;autarcie, mais dans le contrôle de <strong>verrous stratégiques spécifiques</strong>.</p>
<p>C&#8217;est ce qu&#8217;il convient d&#8217;appeler la <strong>souveraineté d&#8217;orchestration</strong>.</p>
<p>En se positionnant sur l&#8217;hébergement physique sur site (on-premise via Vibe) et l&#8217;intégration métier sur mesure par des ingénieurs FDE, Mistral ne prétend pas fabriquer du silicium souverain. Elle sécurise la couche finale et la plus critique de la pile : celle qui touche aux données d&#8217;affaires, au savoir tacite et aux secrets industriels de ses clients. Vous louez peut-être des puces soumises aux législations d&#8217;exportation américaines, mais vous gardez le contrôle total sur l&#8217;intelligence décisionnelle et la compliance de votre organisation. C&#8217;est un levier de confiance inestimable, et le seul pragmatique pour les entreprises régulées et les États européens (comme le prouve l&#8217;accord de défense signé en janvier 2026 avec le Ministère des Armées).</p>
<hr />
<h2>Conclusion : Un pivot pragmatique et risqué</h2>
<p>Le pivot « Palantir » de Mistral AI n&#8217;est pas un aveu de défaite. C&#8217;est un éclair de génie tactique et de pragmatisme européen.</p>
<p>En acceptant la réalité des contraintes physiques de la course au calcul brut généraliste, Mistral choisit de s&#8217;ancrer là où la valeur réelle et durable se capture sur notre continent : dans la plomberie des processus industriels critiques, l&#8217;orchestration locale d&#8217;agents et la garantie d&#8217;une conformité de terrain. L&#8217;IA d&#8217;action ne se gagnera pas uniquement dans des laboratoires de recherche simulant une intelligence pure à coups de téraflops, mais sur le terrain, en codant auprès des systèmes legacy de l&#8217;économie réelle.</p>
<p>Cependant, ce choix opérationnel a un coût. L&#8217;ingénierie de solutions et les ingénieurs déployés (FDE) pèsent lourdement sur la marge brute de l&#8217;entreprise et sont complexes à faire monter à l&#8217;échelle par rapport à un pur modèle de distribution d&#8217;API logicielle. Si la baisse globale des prix de l&#8217;inférence se poursuit, Mistral devra rapidement stabiliser ses revenus récurrents issus de sa suite logicielle d&#8217;orchestration pour amortir le remboursement de sa dette d&#8217;infrastructure de 830 millions de dollars.</p>
<p>L&#8217;Europe n&#8217;aura peut-être pas son OpenAI scalable aux marges infinies d&#8217;un monopole d&#8217;accès d&#8217;API. Mais si le pari d&#8217;Arthur Mensch réussit, elle aura son Palantir de l&#8217;IA : un acteur capable de mailler la puissance de calcul physique à la haute précision de l&#8217;intégration industrielle. C&#8217;est le triomphe de la lucidité d&#8217;usage sur le fantasme de l&#8217;autarcie.</p>
<hr />
<h2>Sources et Références</h2>
<ul>
<li><p><strong>[1] Strategical Seed Memo of Mistral AI (juin 2023)</strong> : <em>&#8220;Mistral AI: generative AI at European scale&#8221;</em>, décrivant le business model de plateforme d&#8217;API logicielle Open Core à marges brutes scalables de type SaaS.</p>
</li>
<li><p><strong>[2] Benchmarks de Raisonnement d&#8217;Entreprise 2025-2026</strong> : Évaluations d&#8217;inférence documentant l&#8217;écart persistant de 12 à 18 mois entre les modèles open-weights de Mistral AI et les capacités de raisonnement logique de GPT-4o, o1 ou Claude 3.5 Sonnet.</p>
</li>
<li><p><strong>[3] Palantir Technologies S-1 Registration Statement (2020)</strong> : Document d&#8217;introduction en bourse décrivant le modèle opérationnel des <em>Forward Deployed Engineers</em> (FDE).</p>
</li>
<li><p><strong>[4] Financement d&#8217;infrastructure BNP Paribas &#8211; Mistral AI (Q1 2026)</strong> : Structuration du crédit à hauteur de 830 millions de dollars garanti sur collatéral d&#8217;actifs physiques (13 800 GPU Nvidia GB300).</p>
</li>
<li><p><strong>[5] Accord-cadre Ministère des Armées &#8211; Mistral AI (janvier 2026)</strong> : Protocole de déploiement souverain de modèles de langage sur site pour les forces armées françaises.</p>
</li>
</ul>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://blog.bguarisma.com/mistral-ai-le-mirage-du-saas-et-le-grand-pivot-vers-le-reel"> Foundations</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/mistral-ai-le-mirage-du-saas-et-le-grand-pivot-vers-le-reel/">Mistral AI : Le Mirage du SaaS et le Grand Pivot vers le Réel</a>]]></content:encoded>
					
		
		<enclosure url="https://cdn.hashnode.com/uploads/covers/61bc41016916c3593a6b04c3/1db25881-d36a-4a39-b22f-904bffdda680.png" length="0" type="image/jpeg" />

		<post-id xmlns="com-wordpress:feed-additions:1">402194</post-id>	</item>
		<item>
		<title>Likert Scale Questions: Your In-Depth Guide</title>
		<link>https://www.r-bloggers.com/2026/06/likert-scale-questions-your-in-depth-guide/</link>
		
		<dc:creator><![CDATA[Unknown]]></dc:creator>
		<pubDate>Thu, 18 Jun 2026 09:46:51 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://www.r-bloggers.com/?guid=a33731dcb43d9afa478c996b68adee3b</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
A Likert scale (pronounced LICK-ert, not "LIKE-ert") is a psychometric rating scale used in surveys and questionnaires to measure attitudes, opinions, and perceptions. Named after American social psychologist Rensis Likert, who developed it in 1932, it remains the most widely used approach to scaling responses in survey research today.</p>
<p>Key Takeaways</p>
<p>...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/likert-scale-questions-your-in-depth-guide/">Likert Scale Questions: Your In-Depth Guide</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.rstudiodatalab.com/2023/07/Likert-Scale.html"> RStudioDataLab</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p style="text-align: justify;">
A <strong>Likert scale</strong> (pronounced <em>LICK-ert</em>, not &#8220;LIKE-ert&#8221;) is a psychometric rating scale used in surveys and questionnaires to measure attitudes, opinions, and perceptions. Named after American social psychologist <strong><a href="https://en.wikipedia.org/wiki/Rensis_Likert" rel="nofollow" target="_blank">Rensis Likert</a></strong>, who developed it in 1932, it remains the most widely used approach to scaling responses in survey research today.
</p>

<div style="background: rgb(240, 247, 255); border-left: 4px solid rgb(37, 99, 235); border-radius: 4px; margin: 20px 0px; padding: 16px 20px;">
<strong>Key Takeaways</strong>
<ul style="margin-top: 8px;">
<li><strong>Definition:</strong> A Likert scale measures how strongly people agree or disagree with a statement, typically using 5 or 7 ordered response options.</li>
<li><strong>Structure:</strong> Each item presents a statement followed by response options from &#8220;Strongly Disagree&#8221; to &#8220;Strongly Agree.&#8221;</li>
<li><strong>Purpose:</strong> Turns subjective opinions into quantitative data for statistical analysis.</li>
<li><strong>Formats:</strong> 4-point, 5-point, 6-point, and 7-point scales each serve different research needs.</li>
<li><strong>Analysis:</strong> Use median and frequency tables for single items; use mean and Cronbach&#8217;s alpha for multi-item scales.</li>
</ul>
</div>

<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEje02Hn87SzxgBr_lLBu7UeDc1qob7R_O8JN6cpwFqMacbvGomUp0Dcg_4EvC99ooP_mcQ6DSyYBGeIue47Jf71vm1GVRCh5eSE_lcE_os8oEZ88zrTxK1f2B22Ezs25OkKcPmtNrf0XImv8eBYZWShjM_cM9COW0-CVuVpVoj-wtzO_EY0hZkbBWymiJE/s1200/Feedback%20scale,%20satisfaction%20rating%20design%20(Instagram%20Post).webp" style="margin-left: 1em; margin-right: 1em;" rel="nofollow" target="_blank">
<img loading="lazy" alt="Likert scale rating options ranging from strongly disagree to strongly agree — complete guide with examples" border="0" data-original-height="640" data-original-width="450" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEje02Hn87SzxgBr_lLBu7UeDc1qob7R_O8JN6cpwFqMacbvGomUp0Dcg_4EvC99ooP_mcQ6DSyYBGeIue47Jf71vm1GVRCh5eSE_lcE_os8oEZ88zrTxK1f2B22Ezs25OkKcPmtNrf0XImv8eBYZWShjM_cM9COW0-CVuVpVoj-wtzO_EY0hZkbBWymiJE/w640-h342/Feedback%20scale,%20satisfaction%20rating%20design%20(Instagram%20Post).webp" title="Likert scale complete guide with examples and analysis" width="450" />
</a>
</div>

<h2>Likert Scale vs. Likert Item: An Important Distinction</h2>
<p style="text-align: justify;">
These two terms are often used interchangeably — that is technically incorrect and matters for your analysis.
</p>
<ul>
<li>A <strong>Likert item</strong> is a single statement with a rated response (e.g., &#8220;I am satisfied with the service: Strongly Disagree → Strongly Agree&#8221;).</li>
<li>A <strong>Likert scale</strong> is the sum or average of several related Likert items designed to measure a single construct.</li>
</ul>
<p style="text-align: justify;">
Treating a single item as a complete &#8220;scale&#8221; is one of the most common errors in survey design. If you have only one question, you have a Likert item, not a Likert scale — and the appropriate statistical treatment differs.
</p>

<h2>Types of Likert Scale Response Options</h2>
<p style="text-align: justify;">
Likert scales are not limited to measuring agreement. Depending on your research objective, you can measure frequency, importance, quality, or likelihood using the same format. The table below shows the most common response option sets:
</p>

<div style="overflow-x: auto;">
<table style="border-collapse: collapse; font-size: 0.95em; width: 100%;">
<thead>
<tr style="background: rgb(37, 99, 235); color: white;">
<th style="padding: 10px 12px; text-align: left;">Dimension</th>
<th style="padding: 10px 12px;">Option 1</th>
<th style="padding: 10px 12px;">Option 2</th>
<th style="padding: 10px 12px;">Option 3</th>
<th style="padding: 10px 12px;">Option 4</th>
<th style="padding: 10px 12px;">Option 5</th>
</tr>
</thead>
<tbody>
<tr style="background: rgb(249, 250, 251);">
<td style="font-weight: 600; padding: 9px 12px;">Agreement</td>
<td style="padding: 9px 12px; text-align: center;">Strongly Disagree</td>
<td style="padding: 9px 12px; text-align: center;">Disagree</td>
<td style="padding: 9px 12px; text-align: center;">Neither</td>
<td style="padding: 9px 12px; text-align: center;">Agree</td>
<td style="padding: 9px 12px; text-align: center;">Strongly Agree</td>
</tr>
<tr>
<td style="font-weight: 600; padding: 9px 12px;">Frequency</td>
<td style="padding: 9px 12px; text-align: center;">Never</td>
<td style="padding: 9px 12px; text-align: center;">Rarely</td>
<td style="padding: 9px 12px; text-align: center;">Sometimes</td>
<td style="padding: 9px 12px; text-align: center;">Often</td>
<td style="padding: 9px 12px; text-align: center;">Always</td>
</tr>
<tr style="background: rgb(249, 250, 251);">
<td style="font-weight: 600; padding: 9px 12px;">Importance</td>
<td style="padding: 9px 12px; text-align: center;">Not Important</td>
<td style="padding: 9px 12px; text-align: center;">Slightly Important</td>
<td style="padding: 9px 12px; text-align: center;">Moderately Important</td>
<td style="padding: 9px 12px; text-align: center;">Important</td>
<td style="padding: 9px 12px; text-align: center;">Very Important</td>
</tr>
<tr>
<td style="font-weight: 600; padding: 9px 12px;">Quality</td>
<td style="padding: 9px 12px; text-align: center;">Very Poor</td>
<td style="padding: 9px 12px; text-align: center;">Poor</td>
<td style="padding: 9px 12px; text-align: center;">Fair</td>
<td style="padding: 9px 12px; text-align: center;">Good</td>
<td style="padding: 9px 12px; text-align: center;">Excellent</td>
</tr>
<tr style="background: rgb(249, 250, 251);">
<td style="font-weight: 600; padding: 9px 12px;">Likelihood</td>
<td style="padding: 9px 12px; text-align: center;">Definitely Not</td>
<td style="padding: 9px 12px; text-align: center;">Probably Not</td>
<td style="padding: 9px 12px; text-align: center;">Possibly</td>
<td style="padding: 9px 12px; text-align: center;">Probably</td>
<td style="padding: 9px 12px; text-align: center;">Definitely</td>
</tr>
<tr>
<td style="font-weight: 600; padding: 9px 12px;">Satisfaction</td>
<td style="padding: 9px 12px; text-align: center;">Very Dissatisfied</td>
<td style="padding: 9px 12px; text-align: center;">Dissatisfied</td>
<td style="padding: 9px 12px; text-align: center;">Neutral</td>
<td style="padding: 9px 12px; text-align: center;">Satisfied</td>
<td style="padding: 9px 12px; text-align: center;">Very Satisfied</td>
</tr>
</tbody>
</table>
</div>

<h2>Likert Scale Formats: 4, 5, 6, and 7 Points Compared</h2>
<p style="text-align: justify;">
Choosing the right number of response points affects the precision of your data and the cognitive load on respondents. Here is how each format differs in practice:
</p>

<div style="overflow-x: auto;">
<table style="border-collapse: collapse; font-size: 0.95em; margin-top: 12px; width: 100%;">
<thead>
<tr style="background: rgb(37, 99, 235); color: white;">
<th style="padding: 10px 12px; text-align: left;">Format</th>
<th style="padding: 10px 12px;">Neutral Point?</th>
<th style="padding: 10px 12px;">Best For</th>
<th style="padding: 10px 12px;">Main Trade-off</th>
</tr>
</thead>
<tbody>
<tr style="background: rgb(249, 250, 251);">
<td style="font-weight: 600; padding: 9px 12px;">4-Point</td>
<td style="padding: 9px 12px; text-align: center;">No (forced choice)</td>
<td style="padding: 9px 12px;">When you need a clear directional opinion</td>
<td style="padding: 9px 12px;">Can frustrate genuinely neutral respondents</td>
</tr>
<tr>
<td style="font-weight: 600; padding: 9px 12px;">5-Point</td>
<td style="padding: 9px 12px; text-align: center;">Yes</td>
<td style="padding: 9px 12px;">Most general research; most familiar to respondents</td>
<td style="padding: 9px 12px;">Central tendency bias is common</td>
</tr>
<tr style="background: rgb(249, 250, 251);">
<td style="font-weight: 600; padding: 9px 12px;">6-Point</td>
<td style="padding: 9px 12px; text-align: center;">No (forced choice)</td>
<td style="padding: 9px 12px;">When you want fine-grained data without a fence-sitter option</td>
<td style="padding: 9px 12px;">Less intuitive labeling</td>
</tr>
<tr>
<td style="font-weight: 600; padding: 9px 12px;">7-Point</td>
<td style="padding: 9px 12px; text-align: center;">Yes</td>
<td style="padding: 9px 12px;">Academic research requiring maximum discrimination</td>
<td style="padding: 9px 12px;">Harder to label all points meaningfully</td>
</tr>
<tr style="background: rgb(249, 250, 251);">
<td style="font-weight: 600; padding: 9px 12px;">10-Point</td>
<td style="padding: 9px 12px; text-align: center;">Yes (implied midpoint)</td>
<td style="padding: 9px 12px;">NPS-style scoring; familiarity from school grades</td>
<td style="padding: 9px 12px;">Data often clusters; not true Likert by strict definition</td>
</tr>
</tbody>
</table>
</div>

<h3>5-Point Likert Scale (Most Common)</h3>
<p style="text-align: justify;">
The 5-point scale is the default choice in most survey research because it balances nuance with simplicity. Example:
</p>
<p style="text-align: justify;"><em>&#8220;The quality of food at XYZ Restaurant is excellent.&#8221;</em></p>
<ol>
<li>Strongly Disagree</li>
<li>Disagree</li>
<li>Neither Agree nor Disagree</li>
<li>Agree</li>
<li>Strongly Agree</li>
</ol>

<h3>4-Point Likert Scale (Forced Choice)</h3>
<p style="text-align: justify;">
Removing the neutral midpoint forces respondents to take a position. Use this when fence-sitting would undermine your research objective — for example, when measuring purchase intent or policy support where &#8220;no opinion&#8221; is not useful data.
</p>
<ol>
<li>Strongly Disagree</li>
<li>Disagree</li>
<li>Agree</li>
<li>Strongly Agree</li>
</ol>

<h3>6-Point Likert Scale</h3>
<p style="text-align: justify;">
Like the 4-point, this eliminates the neutral option while providing more granularity. Useful in employee satisfaction or consumer preference research where a clearer lean is needed.
</p>
<ol>
<li>Strongly Disagree</li>
<li>Disagree</li>
<li>Slightly Disagree</li>
<li>Slightly Agree</li>
<li>Agree</li>
<li>Strongly Agree</li>
</ol>

<h3>7-Point Likert Scale</h3>
<p style="text-align: justify;">
The 7-point scale is preferred in academic and psychological research where capturing subtle differences in attitude matters. It improves statistical reliability but requires more careful labeling.
</p>
<ol>
<li>Strongly Disagree</li>
<li>Moderately Disagree</li>
<li>Slightly Disagree</li>
<li>Neither Agree nor Disagree</li>
<li>Slightly Agree</li>
<li>Moderately Agree</li>
<li>Strongly Agree</li>
</ol>

<h2>When to Use a Likert Scale</h2>
<p style="text-align: justify;">
A Likert scale is the right tool when you need to measure characteristics that have no objective measurement — attitudes, opinions, satisfaction levels, or perceived likelihood. It is not appropriate when:
</p>
<ul>
<li>A simple yes/no question would fully answer your research question</li>
<li>You are measuring factual behaviors (e.g., &#8220;How many times per week do you exercise?&#8221; — use a numerical input instead)</li>
<li>Respondents lack sufficient knowledge of the topic to have a genuine opinion</li>
</ul>
<p style="text-align: justify;">
Use a Likert scale when you need to distinguish between degrees of agreement, not just direction. The difference between &#8220;Agree&#8221; and &#8220;Strongly Agree&#8221; often carries meaningful information in customer satisfaction and employee engagement research.
</p>

<h2>How to Design an Effective Likert Scale</h2>

<h3>Write Clear, Single-Focus Statements</h3>
<p style="text-align: justify;">
Each item must address exactly one idea. A statement like &#8220;The service was fast and the staff were friendly&#8221; is a double-barreled item — the respondent may agree with one half and disagree with the other, making their response uninterpretable.
</p>

<h3>Balance Your Scale</h3>
<p style="text-align: justify;">
A well-designed Likert scale includes an equal number of positively and negatively worded items. This counteracts <strong>acquiescence bias</strong> — the tendency of some respondents to agree with statements regardless of content. If all your items are positive, respondents who habitually agree will appear more satisfied than they actually are.
</p>

<h3>Avoid Leading Language</h3>
<p style="text-align: justify;">
Avoid adverbs like &#8220;very,&#8221; &#8220;extremely,&#8221; or &#8220;always&#8221; inside the item statement itself. &#8220;This website is extremely fast&#8221; will yield fewer &#8220;Strongly Agree&#8221; responses than &#8220;This website is fast,&#8221; not because respondents think differently but because the bar is higher.
</p>

<h3>Keep the Scale Consistent Throughout the Survey</h3>
<p style="text-align: justify;">
Switching between a 5-point and 7-point scale in the same questionnaire forces respondents to mentally reset and increases error rates. Choose one format and use it throughout.
</p>

<h2>Likert Scale Response Bias: What Can Distort Your Data</h2>
<p style="text-align: justify;">
Understanding bias is not optional for anyone analyzing Likert data — it directly affects whether your conclusions are valid.
</p>

<div style="overflow-x: auto;">
<table style="border-collapse: collapse; font-size: 0.95em; margin-top: 12px; width: 100%;">
<thead>
<tr style="background: rgb(37, 99, 235); color: white;">
<th style="padding: 10px 12px; text-align: left;">Bias Type</th>
<th style="padding: 10px 12px;">What Happens</th>
<th style="padding: 10px 12px;">How to Reduce It</th>
</tr>
</thead>
<tbody>
<tr style="background: rgb(249, 250, 251);">
<td style="font-weight: 600; padding: 9px 12px;">Acquiescence bias</td>
<td style="padding: 9px 12px;">Respondents agree with statements regardless of content</td>
<td style="padding: 9px 12px;">Include negatively worded items; balance scale direction</td>
</tr>
<tr>
<td style="font-weight: 600; padding: 9px 12px;">Central tendency bias</td>
<td style="padding: 9px 12px;">Respondents cluster around the midpoint, avoiding extremes</td>
<td style="padding: 9px 12px;">Use an even-point scale to remove the neutral option when appropriate</td>
</tr>
<tr style="background: rgb(249, 250, 251);">
<td style="font-weight: 600; padding: 9px 12px;">Social desirability bias</td>
<td style="padding: 9px 12px;">Respondents choose the answer they think is most socially acceptable</td>
<td style="padding: 9px 12px;">Ensure anonymity; frame items neutrally</td>
</tr>
<tr>
<td style="font-weight: 600; padding: 9px 12px;">Extreme response bias</td>
<td style="padding: 9px 12px;">Some respondents always select the most extreme option</td>
<td style="padding: 9px 12px;">Use more scale points (7-point) to better distinguish genuine extremes</td>
</tr>
</tbody>
</table>
</div>

<h2>How to Analyze Likert Scale Data</h2>

<h3>Single Item vs. Multi-Item Scale: Different Rules Apply</h3>
<p style="text-align: justify;">
This is the most commonly misunderstood part of Likert analysis. A single Likert item produces <strong>ordinal data</strong> — the intervals between response options are not guaranteed to be equal. Calculating a mean on ordinal data is statistically questionable. For a single item, use:
</p>
<ul>
<li><strong>Median</strong> as your measure of central tendency</li>
<li><strong>Frequency tables and percentages</strong> for distribution</li>
<li><strong>Chi-square tests</strong> or <strong>Mann-Whitney U</strong> for group comparisons</li>
</ul>
<p style="text-align: justify;">
A full Likert scale (summed or averaged across multiple items) behaves more like interval data, especially with 5+ items and a reasonable sample size. In this case, parametric statistics become more defensible:
</p>
<ul>
<li><strong>Mean and standard deviation</strong> for descriptive summaries</li>
<li><strong>Cronbach&#8217;s alpha (α)</strong> to test internal consistency — aim for α > 0.7</li>
<li><strong>t-tests or ANOVA</strong> for group comparisons</li>
<li><strong>Spearman correlation</strong> for relationships between Likert scores and other variables</li>
</ul>

<h3>Cronbach&#8217;s Alpha: Checking if Your Scale Holds Together</h3>
<p style="text-align: justify;">
If you are using multiple Likert items to measure the same construct, run Cronbach&#8217;s alpha before reporting results. An alpha above 0.8 indicates strong internal consistency. Values between 0.7 and 0.8 are acceptable. Below 0.7 suggests your items are not measuring the same thing — revise or remove items with low item-total correlations.
</p>

<h3>Likert Scale Examples Across Research Domains</h3>

<p style="text-align: justify;"><strong>Customer satisfaction survey:</strong></p>
<p style="text-align: justify;"><em>&#8220;How satisfied are you with the cleanliness of our facilities?&#8221;</em></p>
<ul>
<li>Very Dissatisfied</li>
<li>Dissatisfied</li>
<li>Neither Satisfied nor Dissatisfied</li>
<li>Satisfied</li>
<li>Very Satisfied</li>
</ul>

<p style="text-align: justify;"><strong>Employee engagement survey:</strong></p>
<p style="text-align: justify;"><em>&#8220;To what extent do you agree: &#8216;The new company policy enhances employee productivity&#8217;?&#8221;</em></p>
<ul>
<li>Strongly Disagree</li>
<li>Disagree</li>
<li>Neither Agree nor Disagree</li>
<li>Agree</li>
<li>Strongly Agree</li>
</ul>

<p style="text-align: justify;"><strong>Online UX research:</strong></p>
<p style="text-align: justify;"><em>&#8220;Rate your agreement: &#8216;The online shopping experience was user-friendly and intuitive.'&#8221;</em></p>
<ul>
<li>Strongly Disagree</li>
<li>Disagree</li>
<li>Neither Agree nor Disagree</li>
<li>Agree</li>
<li>Strongly Agree</li>
</ul>

<h2>Advantages and Disadvantages of Likert Scales</h2>

<div style="overflow-x: auto;">
<table style="border-collapse: collapse; font-size: 0.95em; margin-top: 12px; width: 100%;">
<thead>
<tr style="background: rgb(37, 99, 235); color: white;">
<th style="padding: 10px 12px; text-align: left;">Advantages</th>
<th style="padding: 10px 12px; text-align: left;">Disadvantages</th>
</tr>
</thead>
<tbody>
<tr style="background: rgb(249, 250, 251);">
<td style="padding: 9px 12px;">Easy for respondents to understand and complete</td>
<td style="padding: 9px 12px;">Prone to acquiescence and social desirability bias</td>
</tr>
<tr>
<td style="padding: 9px 12px;">Produces quantitative data from subjective opinions</td>
<td style="padding: 9px 12px;">Ordinal data is not strictly interval — mean can be misleading</td>
</tr>
<tr style="background: rgb(249, 250, 251);">
<td style="padding: 9px 12px;">Flexible: measures agreement, frequency, satisfaction, likelihood</td>
<td style="padding: 9px 12px;">Central tendency bias reduces discrimination</td>
</tr>
<tr>
<td style="padding: 9px 12px;">Widely understood — high response rates</td>
<td style="padding: 9px 12px;">A single item cannot represent a full scale</td>
</tr>
<tr style="background: rgb(249, 250, 251);">
<td style="padding: 9px 12px;">Supports statistical analysis across groups</td>
<td style="padding: 9px 12px;">Does not capture why a respondent chose a particular point</td>
</tr>
</tbody>
</table>
</div>

<h2>Conclusion</h2>
<p style="text-align: justify;">
A Likert scale is one of the most versatile and reliable tools in survey research — when used correctly. The key decisions are choosing the right number of response points for your research goal, writing items that are balanced and unambiguous, and applying the correct statistical method depending on whether you are working with a single item or a multi-item scale. Whether you are measuring customer satisfaction, employee engagement, student attitudes, or any other opinion-based construct, the principles remain the same: clarity in item wording, consistency in format, and honesty about what ordinal data can and cannot tell you.
</p>

<div class="dldCo" id="download1">
  <div class="dldBx">
    <div class="dldTp">
      <div class="dldIm" data-text=".png" style="background-image: url(image_url_here);">
        <svg class="dldSv" viewbox="0 0 34 34">
          <circle class="b" cx="17" cy="17" r="15.92">
          <circle class="c dldPg" cx="17" cy="17" r="15.92">
        </circle></circle></svg>
      </div>
      <div class="dldIn">
        <span data-text="Name">Likert Scale.docs</span>
        <span data-text="Category">Word Document</span>
      </div>
    </div>
    <button class="dldBt dldDl" onclick="download("https://www.rstudiodatalab.com/p/download-2.html?~=JTdCJTIydXJsJTIyJTNBJTIyaHR0cHMlM0ElMkYlMkZkcml2ZS5nb29nbGUuY29tJTJGdWMlM0ZleHBvcnQlM0Rkb3dubG9hZCUyNmlkJTNEMTA1Njk1MTQwNDI3MzIyOTEzNjEwJTI2cnRwb2YlM0R0cnVlJTI2c2QlM0R0cnVlJTIyJTdE", "30", "false", "#download1")"><svg class="line" viewbox="0 0 24 24"><polyline points="8 17 12 21 16 17"><line x1="12" x2="12" y1="12" y2="21"><path d="M20.88 18.09A5 5 0 0 0 18 9h-1.26A8 8 0 1 0 3 16.29"></path></line></polyline></svg></button>
    <button class="dldBt dldRt"><svg class="line" viewbox="0 0 24 24"><polyline points="23 4 23 10 17 10"><path d="M20.49 15a9 9 0 1 1-2.12-9.36L23 10"></path></polyline></svg></button>
  </div>
  <div class="dldSl">
    <div class="dldMe"></div>
  </div>
</div>

<h2>Frequently Asked Questions</h2>

<p style="text-align: justify;"><strong>Q: What is the difference between a Likert scale and a Likert item?</strong></p>
<p style="text-align: justify;">A: A Likert item is a single rated statement. A Likert scale is the aggregate of multiple related items. The distinction matters for analysis: single items should use median and nonparametric tests; full scales can use mean and parametric tests.</p>

<p style="text-align: justify;"><strong>Q: How do you pronounce Likert?</strong></p>
<p style="text-align: justify;">A: The correct pronunciation is &#8220;LICK-ert,&#8221; not &#8220;LIKE-ert.&#8221; It is named after Rensis Likert, who created the scale in 1932.</p>

<p style="text-align: justify;"><strong>Q: Should I use a 5-point or 7-point Likert scale?</strong></p>
<p style="text-align: justify;">A: For general surveys and applied research, a 5-point scale is easier for respondents and produces reliable results. For academic or psychological research where detecting subtle attitude differences matters, a 7-point scale offers better statistical discrimination. Research comparing 5-point and 7-point scales finds that both produce similar mean scores once rescaled — so the choice depends more on respondent context than statistical superiority.</p>

<p style="text-align: justify;"><strong>Q: Can you calculate the mean from Likert scale data?</strong></p>
<p style="text-align: justify;">A: For a single Likert item, technically no — the data is ordinal, so the median is more appropriate. For a complete Likert scale (multiple items summed), calculating the mean is widely practiced and generally acceptable, especially with a sample size above 30 and if the data distribution is approximately normal.</p>

<p style="text-align: justify;"><strong>Q: What is acquiescence bias in Likert scales?</strong></p>
<p style="text-align: justify;">A: Acquiescence bias is the tendency of some respondents to agree with statements regardless of content. It is reduced by including both positively and negatively worded items in your scale, so that habitual agreement on one item is balanced by habitual agreement on an item that pulls in the opposite direction.</p>

<p style="text-align: justify;"><strong>Q: Are Likert scale questions suitable for all types of research?</strong></p>
<p style="text-align: justify;">A: Likert scales work well in social sciences, market research, psychology, education research, healthcare, and UX research. They are not appropriate when you need objective behavioral counts or factual data — use open-ended questions or numerical inputs for those cases.</p>

<p style="text-align: justify;"><strong>Q: Is it necessary to include a neutral response option in a Likert scale?</strong></p>
<p style="text-align: justify;">A: No. Including a neutral option (odd-point scale) allows genuinely ambivalent respondents to express that accurately. Removing it (even-point scale) forces a directional choice, which can reduce central tendency bias but may frustrate respondents who truly have no strong view. Choose based on whether neutrality is meaningful in your research context.</p>

<script type="application/ld+json">
{
  "@context": "https://schema.org/",
  "@type": "BlogPosting",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://www.rstudiodatalab.com/2023/07/Likert-Scale.html"
  },
  "headline": "Likert Scale: Definition, Examples & Complete Guide",
  "description": "Learn what a Likert scale is, see examples of 4, 5, 6, and 7-point formats, and find out how to design, use, and analyze one. Free template included.",
  "image": {
    "@type": "ImageObject",
    "url": "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEje02Hn87SzxgBr_lLBu7UeDc1qob7R_O8JN6cpwFqMacbvGomUp0Dcg_4EvC99ooP_mcQ6DSyYBGeIue47Jf71vm1GVRCh5eSE_lcE_os8oEZ88zrTxK1f2B22Ezs25OkKcPmtNrf0XImv8eBYZWShjM_cM9COW0-CVuVpVoj-wtzO_EY0hZkbBWymiJE/s1200/Feedback%20scale,%20satisfaction%20rating%20design%20(Instagram%20Post).webp",
    "width": "1200",
    "height": "640"
  },
  "author": {
    "@type": "Person",
    "name": "Zubair Goraya"
  },
  "publisher": {
    "@type": "Organization",
    "name": "RStudio Data Lab",
    "logo": {
      "@type": "ImageObject",
      "url": "https://blogger.googleusercontent.com/img/a/AVvXsEiCXGi5dTbTiGMtIoMBoUIdBLlK8YKecPFRRFjiZxHQUt5TTE2CZV4mYhDxAglSpQiGmiugEqp2kkDIzYAjCdGvRe3S3ms4gYzXgcb1EKzOIqj6h2JNExebJ4_NE5Ch4BGTECKIxrM63T6lNElmxUbzwKSNEtQ8VFt-oy4iuH8feMki83oJG6GgBQIb4t0=w200",
      "width": "200",
      "height": "50"
    }
  },
  "datePublished": "2023-07-16",
  "dateModified": "2026-06-18"
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org/",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is the difference between a Likert scale and a Likert item?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "A Likert item is a single rated statement. A Likert scale is the aggregate of multiple related items. Single items should use median and nonparametric tests; full scales can use mean and parametric tests."
      }
    },
    {
      "@type": "Question",
      "name": "How do you pronounce Likert?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "The correct pronunciation is LICK-ert, not LIKE-ert. It is named after Rensis Likert, who created the scale in 1932."
      }
    },
    {
      "@type": "Question",
      "name": "Should I use a 5-point or 7-point Likert scale?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "For general surveys, a 5-point scale is easier for respondents. For academic research requiring fine discrimination between attitudes, a 7-point scale is preferred. Both produce similar mean scores once rescaled."
      }
    },
    {
      "@type": "Question",
      "name": "Can you calculate the mean from Likert scale data?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "For a single Likert item, the median is more appropriate because the data is ordinal. For a multi-item Likert scale with a reasonable sample size, calculating the mean is widely accepted in practice."
      }
    },
    {
      "@type": "Question",
      "name": "What is acquiescence bias in Likert scales?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Acquiescence bias is the tendency of respondents to agree with statements regardless of content. It is reduced by including both positively and negatively worded items in the same scale."
      }
    },
    {
      "@type": "Question",
      "name": "Is it necessary to include a neutral response option in a Likert scale?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "No. An odd-point scale includes a neutral midpoint; an even-point scale forces a directional choice. The decision depends on whether genuine neutrality is meaningful in your research context."
      }
    }
  ]
}
</script>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.rstudiodatalab.com/2023/07/Likert-Scale.html"> RStudioDataLab</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/likert-scale-questions-your-in-depth-guide/">Likert Scale Questions: Your In-Depth Guide</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402108</post-id>	</item>
		<item>
		<title>2026 Rousseeuw Prize for Statistics Awarded to R Core Team for Transforming Statistics Computing Worldwide</title>
		<link>https://www.r-bloggers.com/2026/06/2026-rousseeuw-prize-for-statistics-awarded-to-r-core-team-for-transforming-statistics-computing-worldwide/</link>
		
		<dc:creator><![CDATA[Lauren Livingston]]></dc:creator>
		<pubDate>Thu, 18 Jun 2026 05:09:44 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://r-posts.com/?p=19312</guid>

					<description><![CDATA[<p>The Rousseeuw Prize honors five pioneering developers for nearly three decades of unpaid work building R, the foundational open-source computing language behind artificial intelligence, healthcare, and economic decision-making. The $1 million Rousseeuw Prize for Statistics recognizes three decades of foundational work that transformed how statistical methods are developed, validated, and shared ...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/2026-rousseeuw-prize-for-statistics-awarded-to-r-core-team-for-transforming-statistics-computing-worldwide/">2026 Rousseeuw Prize for Statistics Awarded to R Core Team for Transforming Statistics Computing Worldwide</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="http://r-posts.com/2026-rousseeuw-prize-for-statistics-awarded-to-r-core-team-for-transforming-statistics-computing-worldwide/"> R-posts.com</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p><i><span style="font-weight: 400">The Rousseeuw Prize honors five pioneering developers for nearly three decades of unpaid work building R, the foundational open-source computing language behind artificial intelligence, healthcare, and economic decision-making.</span></i></p>
<ul>
	<li style="font-weight: 400"><span style="font-weight: 400">The $1 million Rousseeuw Prize for Statistics recognizes three decades of foundational work that transformed how statistical methods are developed, validated, and shared globally.</span></li>
	<li style="font-weight: 400"><span style="font-weight: 400">R, the open-source statistical computing language, underpins modern AI development, pharmaceutical research, financial modeling, and global scientific analysis.</span></li>
	<li style="font-weight: 400"><span style="font-weight: 400">Used by organizations including the U.S. Food and Drug Administration, major pharmaceutical companies, and global central banks, R has become the trusted infrastructure for high-stakes analysis because it is stable, auditable, and reproducible.</span></li>
</ul>
<p><b>NEW YORK – June 17, 2026 </b><span style="font-weight: 400">— Five members of the R Core Team have been awarded the prestigious </span><a href="https://www.rousseeuwprize.org/" rel="nofollow" target="_blank"><span style="font-weight: 400">Rousseeuw Prize for Statistics</span></a><span style="font-weight: 400"> for their decades of work building and maintaining the </span><a href="https://www.r-project.org/" rel="nofollow" target="_blank"><span style="font-weight: 400">R Project</span></a><span style="font-weight: 400">, “R,” a free and open-source statistical computing language used across global research institutions, healthcare systems, financial organizations, and technology companies. The Rousseeuw Prize is an international award recognizing major contributions to statistical research. </span></p>
<p><span style="font-weight: 400">The 2026 </span><span style="font-weight: 400">Rousseeuw Prize </span><span style="font-weight: 400">honorees are:</span></p>
<ul>
	<li style="font-weight: 400"><span style="font-weight: 400">Brian D. Ripley, emeritus professor at the University of Oxford</span></li>
	<li style="font-weight: 400"><span style="font-weight: 400">Martin Maechler, emeritus professor at ETH Zurich</span></li>
	<li style="font-weight: 400"><span style="font-weight: 400">Kurt Hornik, department chair at WU Vienna University of Economics and Business</span></li>
	<li style="font-weight: 400"><span style="font-weight: 400">Peter Dalgaard, professor at Copenhagen Business School</span></li>
	<li style="font-weight: 400"><span style="font-weight: 400">Luke Tierney, professor at the University of Iowa</span></li>
</ul>
<p><span style="font-weight: 400">The five laureates receive half of the prize money because they are deemed to have made the longest sustained contributions to the R project; the other half of the prize is shared among the many others who have been active on the R Core Team.</span></p>
<p><span style="font-weight: 400">Together, the R Project volunteers have spent the last 27 years and a collective 28,000 coding hours on R, developing an open-source programming language and software environment that transformed statistics from a proprietary corporate tool into a global public good. The software is relied upon by organizations including the U.S. Food and Drug Administration, pharmaceutical companies, and central banks such as the European Central Bank and the Bank of England.</span></p>
<p><span style="font-weight: 400">The award recognizes the team’s role in making advanced statistical tools widely accessible. By keeping R free and open-source under the GNU General Public License, the R Core Team removed many of the financial barriers that have historically limited access to advanced analytics software. Due to this increased accessibility, hundreds of thousands of users including researchers, students, hospitals, public health organizations, and governments around the world are able to utilize the same statistical tools regardless of institutional resources. In addition, they use R to share transcripts of their data analyses, allowing one user’s workflows to power other users data analyses everywhere around the world. The frictionless spread of these transcripts has powered countless educational data science projects globally and hundreds of course textbooks at the PhD and Master’s level. In a recent twist, it’s not only humans who use R: AI data analyst `agents’ have been learning from the massive volume of published R transcripts and are now able to assist with many everyday data analysis tasks. </span></p>
<p><span style="font-weight: 400">“Long before AI became a global conversation, the R Core Team was building the statistical infrastructure that made today’s data-driven world possible,” said Stanford University statistics professor and leading statistician David Donoho, PhD. “This team’s stewardship of R created an open and trusted foundation for research across disciplines and continents. Few innovations have had such a profound effect on how knowledge is produced, shared, and validated in the modern era.”</span></p>
<p><span style="font-weight: 400">Named after Professor Peter Rousseeuw, a pioneering Belgian statistician known for his foundational work in robust statistics and data analysis, the Rousseeuw Prize for Statistics recognizes innovations that have transformed the understanding and application of data for the benefit of society. Past laureates include internationally renowned statisticians and researchers whose work has advanced fields ranging from epidemiology and artificial intelligence to public policy and scientific discovery.</span></p>
<p><span style="font-weight: 400">For more information, visit </span><a href="https://www.rousseeuwprize.org/" rel="nofollow" target="_blank"><span style="font-weight: 400">https://www.rousseeuwprize.org/</span></a><span style="font-weight: 400">.</span></p>
<p><span style="font-weight: 400">###</span></p>
<p><span style="font-weight: 400">Media Contact:</span></p>
<p><a href="mailto:rousseeuwprize@ampublicrelations.com" rel="nofollow" target="_blank"><span style="font-weight: 400">rousseeuwprize@ampublicrelations.com</span></a></p><hr style="border-top: black solid 1px" /><a href="http://r-posts.com/2026-rousseeuw-prize-for-statistics-awarded-to-r-core-team-for-transforming-statistics-computing-worldwide/" rel="nofollow" target="_blank">2026 Rousseeuw Prize for Statistics Awarded to R Core Team for Transforming Statistics Computing Worldwide</a> was first posted on June 18, 2026 at 5:09 am.<br />
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="http://r-posts.com/2026-rousseeuw-prize-for-statistics-awarded-to-r-core-team-for-transforming-statistics-computing-worldwide/"> R-posts.com</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/2026-rousseeuw-prize-for-statistics-awarded-to-r-core-team-for-transforming-statistics-computing-worldwide/">2026 Rousseeuw Prize for Statistics Awarded to R Core Team for Transforming Statistics Computing Worldwide</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402090</post-id>	</item>
		<item>
		<title>{talib}: Interactive financial charts</title>
		<link>https://www.r-bloggers.com/2026/06/talib-interactive-financial-charts/</link>
		
		<dc:creator><![CDATA[Serkan Korkmaz]]></dc:creator>
		<pubDate>Thu, 18 Jun 2026 05:09:11 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://r-posts.com/?p=19168</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> {talib} is a new R package built on TA-Lib, which is now available on CRAN. The R-package is targeted at individuals and, perhaps, institutions who, in some form or the other, interacts with the financial markets using technical analysis. The library is built with minimal dependencies for long-term stability and ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/talib-interactive-financial-charts/">{talib}: Interactive financial charts</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="http://r-posts.com/talib-interactive-financial-charts/"> R-posts.com</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>





<p><a href="https://github.com/serkor1/ta-lib-R" rel="nofollow" target="_blank">{talib}</a> is a new
<code>R</code> package built on <a href="https://github.com/TA-Lib/ta-lib" rel="nofollow" target="_blank">TA-Lib</a>, which is now
available on CRAN. The <code>R</code>-package is targeted at individuals
and, perhaps, institutions who, in some form or the other, interacts
with the financial markets using technical analysis.</p>
<p>The library is built with minimal dependencies for long-term
stability and freedom in mind. All functions are built around
<code>data.frame</code>– and <code>matrix</code>-classes which are
portable to all other data-containers with minimal effort.</p>
<p>Everything in the library is built ‘bottom-up’ for maximum speed and
memory efficiency. Each indicator interacts directly with R’s C API via
<code>.Call()</code>.</p>
<p>In this blog post I will give a brief introduction to the charting
interface which is built to mimick the behaviour of base
<code>R</code>’s plotting API.</p>
<div id="a-quick-introduction-to-charts" class="section level2">
<h2>A quick introduction to charts</h2>
<p>In this section I will briefly introduce the most important aspects
of the charting, ‘quality of life’-features and themes. Below is a
simple starting point; charting BTC:</p>
<pre>talib::chart(
  talib::BTC
)</pre>
<p><img decoding="async" src="https://i2.wp.com/i.imgur.com/sgaplrf.png?w=578&#038;ssl=1" data-recalc-dims="1" /><!-- --></p>
<p><code>chart()</code> returns a candlestick chart by default. Below
are the <code>formals</code>:</p>
<pre>str(formals(talib::chart))
#&gt; Dotted pair list of 5
#&gt;  $ x    : symbol 
#&gt;  $ type : chr &quot;candlestick&quot;
#&gt;  $ idx  : NULL
#&gt;  $ title: symbol 
#&gt;  $ ...  : symbol</pre>
<div id="modifying-themes" class="section level3">
<h3>Modifying themes</h3>
<pre>talib::set_theme(&quot;hawks_and_doves&quot;)

talib::chart(
  talib::BTC
)</pre>
<p><img decoding="async" src="https://i2.wp.com/i.imgur.com/EqDWSqw.png?w=578&#038;ssl=1" data-recalc-dims="1" /><!-- --></p>
</div>
</div>
<div id="charting-indicators" class="section level2">
<h2>Charting indicators</h2>
<pre>{
  talib::chart(talib::BTC)
  talib::indicator(talib::SMA, n = 7)
  talib::indicator(talib::SMA, n = 14)
  talib::indicator(talib::SMA, n = 21)
  talib::indicator(talib::SMA, n = 28)
  talib::indicator(talib::MACD)
  talib::indicator(talib::trading_volume)
}</pre>
<p><img decoding="async" src="https://i1.wp.com/i.imgur.com/ixQ9OhQ.png?w=578&#038;ssl=1" data-recalc-dims="1" /><!-- --></p>
</div>
<div id="installation" class="section level2">
<h2>Installation</h2>
<p><code>{talib}</code> is finally on CRAN, and can be installed as
follows:</p>
<pre>install.packages(&quot;talib&quot;)</pre>
<p>It can also be built from source with additional
<code>CMake</code>-flags:</p>
<pre>install.packages(
  &quot;talib&quot;,
  type = &quot;source&quot;,
  configure.args = &quot;-O3 -march=native&quot;
)</pre>
</div>
<div id="contributing-and-submitting-bug-reports" class="section level2">
<h2>Contributing and submitting bug-reports</h2>
<p><code>{talib}</code> is still in its early stage so contributions,
even if small, bug-reports, suggestions and critiques are gratefully
accepted.</p>
<p>Visit the repository here: <a href="https://github.com/serkor1/ta-lib-R" class="uri" rel="nofollow" target="_blank">https://github.com/serkor1/ta-lib-R</a>.</p>
<p><sup>Created on 2026-04-29 with <a href="https://reprex.tidyverse.org/" rel="nofollow" target="_blank">reprex v2.1.1</a></sup></p>
</div><hr style="border-top: black solid 1px" /><a href="http://r-posts.com/talib-interactive-financial-charts/" rel="nofollow" target="_blank">{talib}: Interactive financial charts</a> was first posted on June 18, 2026 at 5:09 am.<br />
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="http://r-posts.com/talib-interactive-financial-charts/"> R-posts.com</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/talib-interactive-financial-charts/">{talib}: Interactive financial charts</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402092</post-id>	</item>
		<item>
		<title>Announcing shiny.webawesome: a web UI package for R/Shiny</title>
		<link>https://www.r-bloggers.com/2026/06/announcing-shiny-webawesome-a-web-ui-package-for-r-shiny/</link>
		
		<dc:creator><![CDATA[M. B. Anand]]></dc:creator>
		<pubDate>Thu, 18 Jun 2026 05:09:03 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://r-posts.com/?p=19179</guid>

					<description><![CDATA[<p>shiny.webawesome brings Web Awesome to R Shiny through generated wrappers, reactive bindings, and a bundled runtime. It aims for complete component support while staying close enough to upstream that the Web Awesome docs and examples are directly useful in everyday package use. CRAN &#124; R-universe &#124; Package website &#124; Source repository Background ...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/announcing-shiny-webawesome-a-web-ui-package-for-r-shiny/">Announcing shiny.webawesome: a web UI package for R/Shiny</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="http://r-posts.com/announcing-shiny-webawesome-a-web-ui-package-for-r-shiny/"> R-posts.com</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p><code>shiny.webawesome</code> brings Web Awesome to R Shiny through generated wrappers, reactive bindings, and a bundled runtime. It aims for complete component support while staying close enough to upstream that the Web Awesome docs and examples are directly useful in everyday package use.</p>
<p><a href="https://cran.r-project.org/package=shiny.webawesome" rel="nofollow" target="_blank">CRAN</a> | <a href="https://mbanand.r-universe.dev/shiny.webawesome" rel="nofollow" target="_blank">R-universe</a> | <a href="https://www.shiny-webawesome.org/" rel="nofollow" target="_blank">Package website</a> | <a href="https://github.com/mbanand/shiny.webawesome" rel="nofollow" target="_blank">Source repository</a></p>
<h2>Background</h2>
<p><code>shiny.webawesome</code> started from a perceived gap: Shiny would benefit from a UI library that feels modern, visually polished, and broad enough to support a full app coherently. Web Awesome was a strong fit because it combines rich components, layout and styling utilities, and detailed upstream documentation with a standards-based web-components structure that is straightforward to track from R. That makes it easier for the package to stay close to upstream while still fitting naturally into Shiny.</p>
<h2>The Whole Game</h2>
<p>Here’s a screenshot of a simple, complete example app using <code>shiny.webawesome</code>. The full live app and code are available in an article at: <a href="https://mbanand.github.io/ghpages/announcement/" rel="nofollow" target="_blank">https://mbanand.github.io/ghpages/announcement/</a>..</p>
<p><a href="https://mbanand.github.io/ghpages/announcement/" rel="nofollow" target="_blank"><img loading="lazy" fetchpriority="high" decoding="async" src="https://i1.wp.com/r-posts.com/wp-content/uploads/2026/05/shiny-webawesome-example-328x300.png?resize=328%2C300" alt="Screenshot of a shiny.webawesome example app showing a control sidebar and coordinated chart, summary, and details views." width="328" height="300" class="alignnone size-medium wp-image-19178" srcset_temp="https://i1.wp.com/r-posts.com/wp-content/uploads/2026/05/shiny-webawesome-example-328x300.png?resize=328%2C300 328w, http://r-posts.com/wp-content/uploads/2026/05/shiny-webawesome-example-450x412.png 450w, http://r-posts.com/wp-content/uploads/2026/05/shiny-webawesome-example-768x703.png 768w, http://r-posts.com/wp-content/uploads/2026/05/shiny-webawesome-example.png 925w" sizes="(max-width: 328px) 100vw, 328px" data-recalc-dims="1" /></a></p>
<p>This example showcases many of the facilities available in the package:</p>
<ul>
	<li>a visually rich component library</li>
	<li>direct use of Web Awesome layout utilities such as <code>wa-stack</code>, <code>wa-cluster</code>, <code>wa-gap-*</code>, and <code>wa-align-*</code> classes</li>
	<li>styling through Web Awesome design tokens and classes such as <code>--wa-color-*</code>, <code>--wa-font-*</code>, and <code>wa-body-*</code></li>
	<li>reactive Shiny input bindings</li>
	<li>helpers for calling methods on HTML elements, setting properties, and injecting simple JavaScript snippets</li>
</ul>
<h2>Design Philosophy</h2>
<p><code>shiny.webawesome</code> is designed to stay close to upstream Web Awesome. Most component wrappers are generated from Web Awesome metadata, which helps preserve upstream names, structure, and behavior while translating the interface into normal R conventions such as <code>snake_case</code>.</p>
<p>That close alignment has a practical benefit: when you want deeper details, examples, or component-specific guidance, you can usually go straight to the upstream Web Awesome documentation and apply what you find directly in <code>shiny.webawesome</code>. The package currently supports all Web Awesome components, so the upstream docs are a practical reference for day-to-day use.</p>
<p>To support the server-client model of Shiny, the package adds a small set of page and layout helpers, curated reactive bindings, and a narrow command layer for cases where browser-side interaction goes beyond the generated wrappers.</p>
<p>The result is a package with a clear default path. Use generated wrappers for ordinary UI, use bindings for meaningful reactive state, and reach for commands or small JavaScript glue when the app needs them.</p>
<h2>Shiny Bindings</h2>
<p><code>shiny.webawesome</code> does not forward every browser event and every detail of component telemetry into Shiny. Much component state and interaction detail is better handled locally in the browser rather than turned into server messages. Consequently, the package exposes only a curated set of Shiny bindings that fit Shiny’s reactive model, with an emphasis on meaningful committed state rather than low-level browser event streams.</p>
<p>In the most common case, a binding publishes a durable semantic value. A select reports its current value, a dialog can report whether it is open, and a tree can report the currently selected item ids. The key idea is that Shiny receives the state the app actually cares about, not the raw event name that happened to produce it.</p>
<p>Some components are better treated as actions than values. A button is the clearest example: in Shiny, it behaves like a Shiny action input, with each click producing a new input event. A small number of components need both action semantics and a separate value. A dropdown, for example, may need to trigger reactivity on every choice, including repeated selections of the same item, while also exposing the latest selected value.</p>
<p>This design keeps reactive messaging to the server smaller, clearer, and easier to reason about. If an interaction belongs naturally in Shiny’s input model, <code>shiny.webawesome</code> will expose it as a binding. If it is more naturally a browser-side concern, it is usually a better fit for the command layer or a small amount of JavaScript glue.</p>
<p>For the full binding categories, semantics, and examples, see the package article: <a href="https://www.shiny-webawesome.org/articles/shiny-bindings.html" rel="nofollow" target="_blank">Shiny Bindings</a>.</p>
<h2>Command API</h2>
<p><code>shiny.webawesome</code> covers the most common interaction patterns through generated wrappers, Shiny bindings, and update helpers. But sometimes an app still needs to reach into a live browser element directly: set a property, call a method, or add a small browser-local JavaScript snippet.</p>
<p>For those cases, the package provides a narrow command API. The two main server-side helpers are <code>wa_set_property()</code> and <code>wa_call_method()</code>. They let Shiny code send one-way commands to a browser element identified by <code>id</code>, either by assigning a value to a live property or invoking a browser-side method.</p>
<p>If a component already has a binding or update helper, that should usually remain the first choice. The command layer is for the cases that fall just outside those built-in paths, where the simplest solution is still to tell the existing browser component to do one specific thing.</p>
<p>The package also includes <code>wa_js()</code> for a different kind of job: small, app-local JavaScript glue. That is useful when the missing piece is browser-side logic such as listening for an event, reading live component state, or publishing a derived value back to Shiny with <code>Shiny.setInputValue()</code>.</p>
<p>For more detail and examples, see the package article: <a href="https://www.shiny-webawesome.org/articles/command-api.html" rel="nofollow" target="_blank">Command API</a>.</p>
<h2>Conclusion</h2>
<p><code>shiny.webawesome</code> brings a visually rich component library into Shiny while staying close to upstream Web Awesome. That combination gives polished components, useful layout and styling utilities, and a workflow where upstream documentation and examples remain directly relevant throughout app development.</p>
<p>For more examples, longer articles, and full reference material, see the package website: <a href="https://www.shiny-webawesome.org/" rel="nofollow" target="_blank">shiny-webawesome.org</a>.</p><hr style="border-top: black solid 1px" /><a href="http://r-posts.com/announcing-shiny-webawesome-a-web-ui-package-for-r-shiny/" rel="nofollow" target="_blank">Announcing shiny.webawesome: a web UI package for R/Shiny</a> was first posted on June 18, 2026 at 5:09 am.<br />
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="http://r-posts.com/announcing-shiny-webawesome-a-web-ui-package-for-r-shiny/"> R-posts.com</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/announcing-shiny-webawesome-a-web-ui-package-for-r-shiny/">Announcing shiny.webawesome: a web UI package for R/Shiny</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402094</post-id>	</item>
		<item>
		<title>RStudio AI That Doesn’t Cost a Penny: llmcoder vs. Posit AI Assistant</title>
		<link>https://www.r-bloggers.com/2026/06/rstudio-ai-that-doesnt-cost-a-penny-llmcoder-vs-posit-ai-assistant/</link>
		
		<dc:creator><![CDATA[Shiyang Zheng]]></dc:creator>
		<pubDate>Thu, 18 Jun 2026 05:08:52 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://r-posts.com/?p=19190</guid>

					<description><![CDATA[<p>Introduction If you’re an R user, you’ve probably experienced these moments: You’re writing code and forgot the exact syntax for a function Your code throws an error and you’re staring at a confusing error message You have a block of code but want to understand what ...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/rstudio-ai-that-doesnt-cost-a-penny-llmcoder-vs-posit-ai-assistant/">RStudio AI That Doesn’t Cost a Penny: llmcoder vs. Posit AI Assistant</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="http://r-posts.com/rstudio-ai-that-doesnt-cost-a-penny-llmcoder-vs-posit-ai-assistant/"> R-posts.com</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<h2>Introduction</h2>
<p>If you’re an R user, you’ve probably experienced these moments:</p>
<ul>
	<li>You’re writing code and forgot the exact syntax for a function</li>
	<li>Your code throws an error and you’re staring at a confusing error message</li>
	<li>You have a block of code but want to understand what it does in plain English</li>
	<li>You want to chat with an AI assistant about your data analysis, but don’t want to leave RStudio</li>
</ul>
<p><strong>llmcoder</strong> is an RStudio addin that solves all of these problems by integrating Large Language Model (LLM) assistance directly into your RStudio workflow, and more importantly, it’s <strong>FREE</strong>!</p>
<p>In this post, I’ll show you how <strong>llmcoder</strong> can speed up your R coding and make your workflow smoother.</p>
<p><strong>Watch a quick demo of llmcoder in action:</strong></p>
<p><a href="https://youtu.be/SRzjaURbKCw" rel="nofollow" target="_blank">https://youtu.be/SRzjaURbKCw</a></p>
<hr />
<h2>Installation</h2>
<p>You can install llmcoder from GitHub:</p>
<pre># Install remotes if you haven't already
install.packages(&quot;remotes&quot;)

# Install llmcoder
remotes::install_github(&quot;ShiyangZheng/llmcoder&quot;)</pre>
<p>Load the package:</p>
<pre>library(llmcoder)</pre>
<hr />
<h2>Feature 1: Generate R Code from Inline Comments</h2>
<p>Ever wish you could just type what you want in plain English and get R code instantly?</p>
<p><strong>How to use:</strong></p>
<ol>
	<li>Type a comment describing what you want</li>
	<li>Place your cursor on that line</li>
	<li>Use the <strong>Addins menu</strong> and select “Generate Code from Comment”</li>
</ol>
<p><strong>Example:</strong></p>
<pre># Load the mtcars dataset and create a scatter plot of mpg vs wt, colored by number of cylinders</pre>
<p>After running the addin, the comment is replaced with:</p>
<pre>library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl))) +
  geom_point(size = 3, alpha = 0.8) +
  labs(
    title = &quot;Fuel Efficiency vs Weight by Cylinder Count&quot;,
    x = &quot;Weight (1000 lbs)&quot;,
    y = &quot;Miles per Gallon&quot;,
    color = &quot;Cylinders&quot;
  ) +
  theme_minimal()</pre>
<p><strong>No more switching to ChatGPT or copying code from Stack Overflow!</strong></p>
<hr />
<h2>Feature 2: Fix Console Errors with LLM Assistance</h2>
<p>We’ve all been there – a cryptic error message and you’re not sure what went wrong.</p>
<p><strong>How to use:</strong></p>
<ol>
	<li>Run code that produces an error</li>
	<li>The error appears in the console</li>
	<li>Use the <strong>Addins menu</strong> and select “Fix Error with LLM”</li>
</ol>
<p><strong>Example:</strong></p>
<pre>library(dplyr)
data %&gt;%
  filter(cyl == 4) %&gt;%
  summary()
# Error: object 'data' not found</pre>
<p>llmcoder captures the error and sends it to the LLM, which returns an explanation and suggests:</p>
<pre>mtcars %&gt;% filter(cyl == 4) %&gt;% summary()</pre>
<hr />
<h2>Feature 3: Explain Selected Code in Plain English</h2>
<p>Sometimes you inherit code from a colleague or find a Stack Overflow answer and want to understand what it does.</p>
<p><strong>How to use:</strong></p>
<ol>
	<li>Select a block of code in the editor</li>
	<li>Use the <strong>Addins menu</strong> and select “Explain Code”</li>
</ol>
<p><strong>Example:</strong></p>
<pre>mtcars %&gt;%
  group_by(cyl) %&gt;%
  summarize(
    mean_mpg = mean(mpg, na.rm = TRUE),
    sd_mpg = sd(mpg, na.rm = TRUE),
    count = n()
  ) %&gt;%
  arrange(desc(mean_mpg))</pre>
<p>llmcoder returns:</p>
<ol>
	<li>Takes the built-in <code>mtcars</code> dataset</li>
	<li>Groups the data by the number of cylinders (<code>cyl</code>)</li>
	<li>Calculates the mean and standard deviation of miles per gallon (<code>mpg</code>) for each group</li>
	<li>Arranges the results in descending order of mean fuel efficiency</li>
</ol>
<hr />
<h2>Feature 4: Multi-Turn Chat Panel with Session Context</h2>
<p>This is the flagship feature. llmcoder includes a <strong>Chat Panel</strong> that understands your current R session.</p>
<p><strong>How to open:</strong> Use the <strong>Addins menu</strong> and select “Open Chat Panel”</p>
<p><strong>What makes it special?</strong></p>
<p>The Chat Panel is <strong>session-aware</strong>:</p>
<ul>
	<li>It knows which packages you have loaded</li>
	<li>It knows what objects are in your global environment</li>
	<li>It can read the contents of your current script</li>
	<li>It has access to your recent console history</li>
</ul>
<p><strong>Example conversation:</strong></p>
<p><strong>You:</strong> What’s the correlation between mpg and wt in mtcars?</p>
<p><strong>AI:</strong> The correlation between mpg and wt in the mtcars dataset is -0.87, indicating a strong negative relationship. As weight increases, fuel efficiency decreases.</p>
<pre>cor(mtcars$mpg, mtcars$wt, use = &quot;complete.obs&quot;)</pre>
<p><strong>Want to see the Chat Panel in action?</strong> Watch this demo:<br />
<a href="https://youtu.be/zP-RuCN3q14" rel="nofollow" target="_blank">https://youtu.be/zP-RuCN3q14</a></p>
<hr />
<h2>Supported LLM Providers</h2>
<p>llmcoder supports <strong>multiple LLM providers</strong> – you can choose the one that works best for you:</p>
<table>
<tbody>
<tr>
<th>Provider</th>
<th>API Key</th>
<th>Notes</th>
</tr>
<tr>
<td>OpenAI (GPT-4/3.5)</td>
<td>Yes</td>
<td>Most popular</td>
</tr>
<tr>
<td>Anthropic (Claude)</td>
<td>Yes</td>
<td>Great for long conversations</td>
</tr>
<tr>
<td>DeepSeek</td>
<td>Yes</td>
<td>Cost-effective</td>
</tr>
<tr>
<td>Groq</td>
<td>Yes</td>
<td>Very fast inference</td>
</tr>
<tr>
<td>Together AI</td>
<td>Yes</td>
<td>Open-source models</td>
</tr>
<tr>
<td>OpenRouter</td>
<td>Yes</td>
<td>Access multiple models</td>
</tr>
<tr>
<td><strong>Ollama</strong></td>
<td><strong>No</strong></td>
<td><strong>Fully local, no API key!</strong></td>
</tr>
<tr>
<td>Custom endpoint</td>
<td>Yes</td>
<td>LM Studio, vLLM, llama.cpp</td>
</tr>
</tbody>
</table>
<p><strong>Privacy note:</strong> If you use Ollama, all processing happens <strong>locally on your machine</strong>. No data is sent to external servers.</p>
<hr />
<h2>Customization: Choose Your Prompt Style</h2>
<p>The Chat Panel allows you to select different <strong>prompt styles</strong>:</p>
<ul>
	<li><strong>General Assistant</strong>: Best for general questions</li>
	<li><strong>R Code Helper</strong>: Focuses on writing clean, idiomatic R code</li>
	<li><strong>Statistics Advisor</strong>: Helps with statistical concepts and test selection</li>
	<li><strong>Research (Psycho)</strong>: Tailored for psycholinguistics researchers</li>
</ul>
<hr />
<h2>Why llmcoder?</h2>
<p>There are many AI coding assistants out there (Copilot, Cursor, etc.), so why llmcoder?</p>
<ol>
	<li><strong>Native RStudio integration</strong>: No need to switch to another app or browser tab</li>
	<li><strong>Session-aware</strong>: The LLM knows what you’re working on</li>
	<li><strong>Multiple LLM providers</strong>: Choose the one you prefer (or use a local model for privacy)</li>
	<li><strong>Open source</strong>: MIT license, free to use and modify</li>
	<li><strong>Designed for R users</strong>: Not a generic coding assistant – it understands R-specific workflows</li>
</ol>
<hr />
<h2>Call to Action</h2>
<p>Ready to try llmcoder?</p>
<pre>remotes::install_github(&quot;ShiyangZheng/llmcoder&quot;)</pre>
<p>GitHub: <a href="https://github.com/ShiyangZheng/llmcoder" rel="nofollow" target="_blank">https://github.com/ShiyangZheng/llmcoder</a></p>
<p>If you encounter any bugs or have feature requests, please file an issue: <a href="https://github.com/ShiyangZheng/llmcoder/issues" rel="nofollow" target="_blank">https://github.com/ShiyangZheng/llmcoder/issues</a></p>
<p><strong>Star the repo</strong> if you find it useful!</p>
<hr />
<h2>About the Author</h2>
<p>Shiyang Zheng is a PhD student in Psycholinguistics at the University of Nottingham. His research focuses on idiom acquisition and computational modeling. He built llmcoder to make R coding easier for himself and the R community.</p>
<ul>
	<li>GitHub: <a href="https://github.com/ShiyangZheng" rel="nofollow" target="_blank">@ShiyangZheng</a></li>
	<li>Academic website: <a href="https://shiyangzheng.top/" rel="nofollow" target="_blank">shiyangzheng.top</a></li>
	<li>ORCID: <a href="https://orcid.org/0000-0003-0511-4683" rel="nofollow" target="_blank">0000-0003-0511-4683</a></li>
</ul><hr style="border-top: black solid 1px" /><a href="http://r-posts.com/rstudio-ai-that-doesnt-cost-a-penny-llmcoder-vs-posit-ai-assistant/" rel="nofollow" target="_blank">RStudio AI That Doesn’t Cost a Penny: llmcoder vs. Posit AI Assistant</a> was first posted on June 18, 2026 at 5:08 am.<br />
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="http://r-posts.com/rstudio-ai-that-doesnt-cost-a-penny-llmcoder-vs-posit-ai-assistant/"> R-posts.com</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/rstudio-ai-that-doesnt-cost-a-penny-llmcoder-vs-posit-ai-assistant/">RStudio AI That Doesn’t Cost a Penny: llmcoder vs. Posit AI Assistant</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402096</post-id>	</item>
		<item>
		<title>New CRAN Package for sparse PCA – msPCA</title>
		<link>https://www.r-bloggers.com/2026/06/new-cran-package-for-sparse-pca-mspca/</link>
		
		<dc:creator><![CDATA[Jean Pauphilet]]></dc:creator>
		<pubDate>Thu, 18 Jun 2026 05:08:40 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://r-posts.com/?p=17970</guid>

					<description><![CDATA[<p>The package msPCA is now available on CRAN! <br />
It implements a new method for computing multiple sparse principal components of a dataset. Unlike other available packages, it generates PCs that are sparse and orthogonal, leading to a generally higher fra...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/new-cran-package-for-sparse-pca-mspca/">New CRAN Package for sparse PCA – msPCA</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="http://r-posts.com/new-cran-package-for-sparse-pca-mspca/"> R-posts.com</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
The package <a href="https://cran.r-project.org/web/packages/msPCA/index.html" rel="nofollow" target="_blank">msPCA</a> is now available on CRAN! <br />
It implements a new method for computing <span style="text-decoration: underline">multiple</span> sparse principal components of a dataset. Unlike other available packages, it generates PCs that are sparse and orthogonal, leading to a generally higher fraction of variance explained. <br />
<img loading="lazy" decoding="async" src="https://i2.wp.com/r-posts.com/wp-content/uploads/2025/12/n_vs_orthogonality-436x300.png?resize=436%2C300" alt="" width="436" height="300" class="alignnone size-medium wp-image-17971" srcset_temp="https://i2.wp.com/r-posts.com/wp-content/uploads/2025/12/n_vs_orthogonality-436x300.png?resize=436%2C300 436w, http://r-posts.com/wp-content/uploads/2025/12/n_vs_orthogonality-450x310.png 450w, http://r-posts.com/wp-content/uploads/2025/12/n_vs_orthogonality-768x529.png 768w, http://r-posts.com/wp-content/uploads/2025/12/n_vs_orthogonality.png 1368w" sizes="(max-width: 436px) 100vw, 436px" data-recalc-dims="1" /> <img loading="lazy" decoding="async" src="https://i2.wp.com/r-posts.com/wp-content/uploads/2025/12/n_vs_variance-436x300.png?resize=436%2C300" alt="" width="436" height="300" class="alignnone size-medium wp-image-17972" srcset_temp="https://i2.wp.com/r-posts.com/wp-content/uploads/2025/12/n_vs_variance-436x300.png?resize=436%2C300 436w, http://r-posts.com/wp-content/uploads/2025/12/n_vs_variance-450x310.png 450w, http://r-posts.com/wp-content/uploads/2025/12/n_vs_variance-768x529.png 768w, http://r-posts.com/wp-content/uploads/2025/12/n_vs_variance.png 1368w" sizes="(max-width: 436px) 100vw, 436px" data-recalc-dims="1" /><hr style="border-top: black solid 1px" /><a href="http://r-posts.com/new-cran-package-for-sparse-pca-mspca/" rel="nofollow" target="_blank">New CRAN Package for sparse PCA – msPCA</a> was first posted on June 18, 2026 at 5:08 am.<br />
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="http://r-posts.com/new-cran-package-for-sparse-pca-mspca/"> R-posts.com</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/new-cran-package-for-sparse-pca-mspca/">New CRAN Package for sparse PCA – msPCA</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402106</post-id>	</item>
		<item>
		<title>DIVINE: a new R package for working with a real-world COVID-19 clinical cohort</title>
		<link>https://www.r-bloggers.com/2026/06/divine-a-new-r-package-for-working-with-a-real-world-covid-19-clinical-cohort/</link>
		
		<dc:creator><![CDATA[Cristian Tebé]]></dc:creator>
		<pubDate>Thu, 18 Jun 2026 05:08:18 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://r-posts.com/?p=19290</guid>

					<description><![CDATA[<p>Clinical data are rarely as clean, compact or convenient as the examples we often use when teaching statistics or R. Real hospital datasets are usually distributed across several tables, include missing values, contain repeated structures, and require careful documentation before they can be reused. The new R package DIVINE is ...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/divine-a-new-r-package-for-working-with-a-real-world-covid-19-clinical-cohort/">DIVINE: a new R package for working with a real-world COVID-19 clinical cohort</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="http://r-posts.com/divine-a-new-r-package-for-working-with-a-real-world-covid-19-clinical-cohort/"> R-posts.com</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p><br />
Clinical data are rarely as clean, compact or convenient as the examples we often use when teaching statistics or R. Real hospital datasets are usually distributed across several tables, include missing values, contain repeated structures, and require careful documentation before they can be reused.</p>
<p>The new R package <a href="https://bruigtp.github.io/DIVINE/" rel="nofollow" target="_blank"><code>DIVINE</code> </a>is interesting precisely because it brings that reality into the R ecosystem in an accessible way.  Available on CRAN, <a href="https://cran.r-project.org/package=DIVINE" rel="nofollow" target="_blank"><code>DIVINE</code> </a>provi<img loading="lazy" decoding="async" src="https://i1.wp.com/r-posts.com/wp-content/uploads/2026/06/divine_logo1-300x300.png?resize=300%2C300" alt="" width="300" height="300" class="size-medium wp-image-19291 alignright" srcset_temp="https://i1.wp.com/r-posts.com/wp-content/uploads/2026/06/divine_logo1-300x300.png?resize=300%2C300 300w, http://r-posts.com/wp-content/uploads/2026/06/divine_logo1-450x450.png 450w, http://r-posts.com/wp-content/uploads/2026/06/divine_logo1-150x150.png 150w, http://r-posts.com/wp-content/uploads/2026/06/divine_logo1-768x768.png 768w, http://r-posts.com/wp-content/uploads/2026/06/divine_logo1-1536x1536.png 1536w, http://r-posts.com/wp-content/uploads/2026/06/divine_logo1.png 1563w" sizes="auto, (max-width: 300px) 100vw, 300px" data-recalc-dims="1" />des a curated collection of datasets from a multicentre cohort of hospitalized COVID-19 patients in the south metropolitan area of Barcelona. The package is accompanied by a recent publication in <a href="https://www.nature.com/articles/s41597-026-07479-7" rel="nofollow" target="_blank"><em>Scientific Data</em></a>, which describes the database, its structure, data collection process and potential reuse for clinical epidemiology, teaching and methodological research.</p>
<h2>A clinical dataset packaged for R</h2>
<p>The package includes 14 datasets covering different clinical domains, such as demographics, comorbidities, symptoms, vital signs, severity scores, ICU information, treatments, complications, vaccination and end-of-follow-up data.</p>
<p>This relational structure is one of the most valuable aspects of the package. Instead of providing a single pre-merged analysis file, <code>DIVINE</code> preserves the logic of a real clinical database, where information is distributed across several linked tables. This makes it especially useful for applied teaching and for demonstrating realistic data-management workflows in R.</p>
<p>For example:</p>
<pre>install.packages(&quot;DIVINE&quot;)
library(DIVINE)

data(package = &quot;DIVINE&quot;)
</pre>
<p>The datasets can then be loaded in the usual way:</p>
<pre>data(&quot;demographic&quot;)
data(&quot;vital_signs&quot;)
data(&quot;scores&quot;)
</pre>
<p>The common identifiers allow users to combine information across tables and build analysis datasets depending on the research question.</p>
<h2>More than a data package</h2>
<p>Although the datasets are the main contribution, <code>DIVINE</code> also includes helper functions for common epidemiological data workflows. These include:</p>
<pre>data_overview()
multi_join()
stats_table()
multi_plot()
impute_missing()
export_data()
</pre>
<p>These functions are not intended to replace the broader R ecosystem, but they make the package easier to use in teaching, exploratory analysis and reproducible examples.</p>
<p>A minimal workflow might look like this:</p>
<pre>library(DIVINE)

data(&quot;demographic&quot;)
data(&quot;vital_signs&quot;)
data(&quot;scores&quot;)

baseline &lt;- multi_join(
  list(demographic, vital_signs, scores),
  key = c(&quot;record_id&quot;, &quot;covid_wave&quot;, &quot;center&quot;),
  join_type = &quot;left&quot;
)

data_overview(baseline)

stats_table(
  baseline,
  vars = c(&quot;age&quot;, &quot;sex&quot;),
  by = &quot;covid_wave&quot;,
  statistic_type = &quot;median_iqr&quot;,
  pvalue = TRUE
)
</pre>
<p>This example already illustrates several important aspects of clinical data analysis: understanding table structure, joining related datasets, checking variables, and producing descriptive summaries.</p>
<h2>Why it is useful for R users</h2>
<p>For a specialised R audience, the value of <code>DIVINE</code> is not only that it provides COVID-19 data. Its main interest is that it offers a realistic, documented and reusable clinical database within a familiar R workflow.</p>
<p>The package may be useful for:</p>
<ul>
	<li>
<p>teaching data management with relational clinical datasets;</p>
</li>
	<li>
<p>preparing examples for biostatistics or epidemiology courses;</p>
</li>
	<li>
<p>demonstrating descriptive clinical analyses;</p>
</li>
	<li>
<p>exploring missing data and variable availability;</p>
</li>
	<li>
<p>developing prognostic modelling examples;</p>
</li>
	<li>
<p>validating prediction models;</p>
</li>
	<li>
<p>creating reproducible workflows using real-world health data.</p>
</li>
</ul>
<p>This makes <code>DIVINE</code> particularly attractive for applied biostatisticians, epidemiologists, clinical researchers and R instructors who want to move beyond toy datasets.</p><hr style="border-top: black solid 1px" /><a href="http://r-posts.com/divine-a-new-r-package-for-working-with-a-real-world-covid-19-clinical-cohort/" rel="nofollow" target="_blank">DIVINE: a new R package for working with a real-world COVID-19 clinical cohort</a> was first posted on June 18, 2026 at 5:08 am.<br />
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="http://r-posts.com/divine-a-new-r-package-for-working-with-a-real-world-covid-19-clinical-cohort/"> R-posts.com</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/divine-a-new-r-package-for-working-with-a-real-world-covid-19-clinical-cohort/">DIVINE: a new R package for working with a real-world COVID-19 clinical cohort</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402123</post-id>	</item>
		<item>
		<title>Snapshot Testing in R: Beyond Screenshots</title>
		<link>https://www.r-bloggers.com/2026/06/snapshot-testing-in-r-beyond-screenshots/</link>
		
		<dc:creator><![CDATA[Jakub Sobolewski]]></dc:creator>
		<pubDate>Thu, 18 Jun 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://jakubsobolewski.com/blog/snapshot-testing-beyond-screenshots</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Snapshot testing is not just for screenshots. Capture console output, logs, data frames, errors, and whole data structures. Practices that keep snapshot tests trustworthy instead of brittle.</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/snapshot-testing-in-r-beyond-screenshots/">Snapshot Testing in R: Beyond Screenshots</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://jakubsobolewski.com/blog/snapshot-testing-beyond-screenshots"> Jakub Sobolewski</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p><img src="https://i2.wp.com/jakubsobolewski.com/blog/snapshot-testing-beyond-screenshots/og-image.png?w=578&#038;ssl=1" alt="Snapshot Testing in R: Beyond Screenshots" data-recalc-dims="1" /></p><p>Snapshot testing is not about screenshots.</p>
<p>Most people meet it through UI regression tests: render a component, save a picture, fail the build when the picture changes. So the technique gets filed away as “the thing that compares images.” That is one use. But not the only one.</p>
<p>The mechanic underneath is general. Capture some output, save it to a file, and on every later run compare fresh output against the saved copy. The output can be a plot. It can also be console text, a log, a data frame, an error message, or a deeply nested list. Anything you can serialize, you can snapshot.</p>
<p>What makes it powerful is also what makes it dangerous: <strong>you are the test oracle.</strong> There is no <code>expect_equal(result, 42)</code> stating the answer up front. You accept the first snapshot because you read it and judged it correct. Get that review wrong, or skip it, and you have pinned a bug in place and called it a passing test.</p>
<p>In this post I want to walk through using snapshot testing for what it is good for, and the practices that make it efficient.</p>
<h2 id="what-snapshot-testing-actually-is">What snapshot testing actually is</h2>
<p>In testthat’s third edition the entry points are <code>expect_snapshot()</code> and <code>expect_snapshot_file()</code>. The first run records output into a <code>_snaps/</code> directory next to your tests, as a <code>.md</code> file named after the test file. Every run after that compares against what’s recorded. A mismatch fails the test and shows you a diff.</p>
<pre>test_that(&quot;summary prints a one-line overview&quot;, {
  expect_snapshot(print(summary(1:10)))
})</pre>
<p>The first time, testthat writes the printed output to <code>_snaps/summary.md</code> and the test passes (with a note that a new snapshot was recorded). From then on, that file is the expected value.</p>
<p>You reach for this in those situations:</p>
<ul>
<li>The output is <strong>large or tedious to assert field by field</strong>, but you can <em>recognize</em> whether it’s correct by looking at it.</li>
<li>The output is <strong>impossible to express in code</strong>: a rendered plot, a rendered table, any image.</li>
<li>The output is <strong>impractical to express in code</strong> a formatted CLI report, a full console transcript with its alignment. You can’t write <code>expect_equal()</code> for “the table is laid out correctly.” You can look at it and know.</li>
</ul>
<h2 id="five-practices-that-keep-snapshots-trustworthy">Five practices that keep snapshots trustworthy</h2>
<p>Snapshot suites rot in predictable ways: noise the diff engine flags as failures, snapshots nobody can review, tests that flake on a different machine, and (as with any other test) titles that say nothing.</p>
<p>Those practices prevent it.</p>
<h3 id="1-scope-to-exactly-what-proves-the-behavior">1. Scope to exactly what proves the behavior</h3>
<p>Capture the plot, not the page the plot lives on.</p>
<p>If you’re testing that a chart colors points correctly, snapshot the chart. Not the dashboard it’s embedded in, with its header, its sidebar, the current date in the corner, and a “last refreshed” timestamp. Every one of those is unrelated to the behavior under test, and every one is a reason for the comparison to fail when nothing you care about changed.</p>
<p>A snapshot’s diff engine is literal. It flags any difference. So don’t hand it differences that don’t matter. Scope the capture down to the smallest thing that demonstrates the behavior, and the only way the test can fail is if that behavior breaks.</p>
<p><strong>Don’t give the diff engine extra reasons to make false positives.</strong></p>
<h3 id="2-make-snapshots-human-readable">2. Make snapshots human-readable</h3>
<p>You are going to review these files by eye. So they have to be readable by eye.</p>
<p>Store snapshots as text: markdown, CSV, SVG, JSON. Never as a binary blob. I heard on the R Weekly podcast that some teams keep snapshots as <code>.rds</code> files, and I’d push back on that hard. A binary snapshot can’t be read in an editor, can’t be reviewed in a pull request, and can’t be diffed when it changes. It defeats the entire premise. The whole technique rests on a human being able to look at the recorded output and decide it’s right. You want to also help your code reviewers to do that, don’t hide the “truth” in a binary file. Especially when accepting the first snapshot as “the truth”; make it easy for your collaborators to read and judge the snapshot!</p>
<p><strong>Don’t introduce extra points of friction. Keep it simple.</strong></p>
<h3 id="3-remove-nondeterminism-or-filter-whats-left">3. Remove nondeterminism, or filter what’s left</h3>
<p>A snapshot that changes on every run is useless. Timestamps, random IDs, elapsed-time measurements, unordered query results. Any of these will make the file churn and train you to accept changes blindly.</p>
<p>Fix it at the source first. Inject the things that vary so the test controls them: pass a fixed clock instead of calling <code>Sys.time()</code>, set a seed, supply IDs rather than generating them. This is dependency injection, the same move that makes any code testable.</p>
<p>When you can’t remove the variation, filter it. testthat’s <code>expect_snapshot()</code> takes a <code>transform</code> argument: a function that cleans each line of output before it’s compared. Strip the timestamps, drop the spinner characters, normalize the paths. For data, impose a deterministic order before you serialize.</p>
<p><strong>Don’t let snapshots change when there is no reason for them to change.</strong></p>
<h3 id="4-stabilize-platform-differences">4. Stabilize platform differences</h3>
<p>A rendered snapshot depends on more than your code. Fonts render differently on macOS and Linux. A new release of a plotting or formatting dependency shifts the output by a pixel or a label. R itself changes between versions. None of that is a regression, but a literal diff engine can’t tell, so a snapshot recorded on your laptop fails the moment it runs anywhere else.</p>
<p>Two tools handle this, and they work together.</p>
<p><strong>A. Variants keep incompatible environments from overwriting each other.</strong> Both <code>expect_snapshot()</code> and <code>expect_snapshot_file()</code> take a <code>variant</code> argument. testthat stores each variant in its own subdirectory, <code>_snaps/{variant}/</code>, so the macOS render and the Linux render sit side by side instead of clobbering one another. Key the variant on whatever actually moves the output: the operating system, the R version, a specific dependency’s version, or a combination. You decide what relevant axes of variation are, and you key the snapshots to them. Maybe you want to support rendering on different platforms, and you want to support different versions of a plotting library. Then the variant should include both the platform and the library version.</p>
<pre>variant = paste(platform_variant(), packageVersion(&quot;echarts4r&quot;), sep = &quot;-&quot;)</pre>
<p><strong>B. Let one platform generate the truth, and let the whole team use it.</strong> Variants solve the storage problem. They don’t solve the contribution problem: your developers are on different operating systems, and you don’t want each regenerated snapshot to depend on whose machine produced it. If your team works on Windows, macOS and Linux, you may not want to check-into the repository 3 slightly different copies of the same thing. Nominate a single canonical environment, your CI runner, and treat the snapshots it produces as authoritative.</p>
<p>When a snapshot test fails on GitHub Actions, the files that run produced are uploaded as build artifacts. testthat gives you a helper to pull them straight into your local checkout:</p>
<pre>testthat::snapshot_download_gh(
  repository = &quot;your-org/your-package&quot;,
  run_id = &quot;47905180716&quot;
)</pre>
<p><em>This is a quite <a href="https://tidyverse.org/blog/2025/11/testthat-3-3-0/#other-new-features" rel="nofollow" target="_blank">recent addition</a> to testthat.</em> Worth knowing.</p>
<p>You rarely have to look the call up. When snapshots fail inside an <code>R CMD check</code> job, testthat prints the exact <code>snapshot_download_gh()</code> line in the CI log, ready to copy. Run it, review the downloaded files the way you’d review any first snapshot, and commit them.</p>
<p>That turns snapshot testing into a team practice. A contributor on Windows can change a plot, open a pull request, and let CI render the canonical image. The reviewer accepts the snapshot CI produced, not one tied to a particular laptop. The truth comes from one place, and everyone contributes to it through the same door.</p>
<p>You’ll notice both options have their advantages and disadvantages. <strong>It’s up to you to decide which one fits your team and workflow better.</strong></p>
<p>But now that you know your options you can test them out and see which one works best for you.</p>
<h3 id="5-name-the-test-and-the-snapshot-so-they-stand-alone">5. Name the test and the snapshot so they stand alone</h3>
<p>A test title should state the precondition and the expected output.</p>
<p>The same holds for snapshot tests. Not “reporter works.” Something like “progress reporter shows survived mutants in summary.” The title is the first thing a reviewer reads when the snapshot changes.</p>
<p>But snapshot tests aren’t self contained.</p>
<p>The assertion of a snapshot test is a file. That means you read the test and then you need to open the file to understand what is really the expected outcome. But there is also another workflow: you might also browse the snapshots directory first and get a grasp of what the code is producing.</p>
<p>With <code>expect_snapshot_file()</code> you also name the snapshot file yourself. Use that. A file called <code>scatterplot_colors_points_in_the_band.png</code> tells you what it should contain before you even read the test itself. The filename and its content should tell the story on their own, without you having to dig up the test that produced them.</p>
<hr>
<p>The rest of this post is five worked examples, each leaning on these five practices.</p>
<h2 id="example-1-plots-from-simple-to-interactive">Example 1: plots, from simple to interactive</h2>
<h3 id="ggplot-snapshot-the-svg-not-a-png">ggplot: snapshot the SVG, not a PNG</h3>
<p>For ggplot, the right tool is <a href="https://vdiffr.r-lib.org/" rel="nofollow" target="_blank">vdiffr</a>. It renders a plot to <strong>SVG</strong>, which is text, and snapshots that.</p>
<pre>test_that(&quot;points below lower threshold are green, above upper are red, inside are yellow&quot;, {
  # Arrange
  data &lt;- data.frame(
    x = seq_date(3),
    y = c(10, 20, 30)
  )

  # Act
  p &lt;- threshold_plot(data, lower = 15, upper = 25)

  # Assert
  vdiffr::expect_doppelganger(&quot;threshold_plot_below_threshold_green_above_threshold_red_inside_threshold_yellow&quot;, p)
})</pre>
<p>Two of our practices fall out of this for free. The snapshot is <strong>scoped</strong> to the plot object itself, not a Shiny page that embeds it. And it’s <strong>human-readable</strong>: the recorded <code>.svg</code> is text you can open, and vdiffr ships a Shiny app (<code>vdiffr::manage_cases()</code>) that shows the old and new render side by side when something changes. You review the picture, but the artifact under version control is inspectable text.</p>
<p>Here’s the actual plot that test captures:</p>
<p><img alt="Threshold plot: green point below 15, yellow within, red above 25" loading="lazy" decoding="async" fetchpriority="auto" width="450" src="https://jakubsobolewski.com/_astro/threshold_plot.BfKB71nF_2o73BG.webp" ></p>
<p>And here are the first lines of the SVG vdiffr would record. This is the whole point of <a href="https://jakubsobolewski.com/blog/snapshot-testing-beyond-screenshots#2-make-snapshots-human-readable" rel="nofollow" target="_blank">practice #2</a>: the snapshot under version control is text you can read.</p>
<pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;svg xmlns=&quot;http://www.w3.org/2000/svg&quot; xmlns:xlink=&quot;http://www.w3.org/1999/xlink&quot; width=&quot;450&quot; viewBox=&quot;0 0 504 288&quot;&gt;
&lt;defs&gt;
&lt;g&gt;
&lt;g id=&quot;glyph-0-0&quot;&gt;</pre>
<p>Comparing text is easier than comparing pixels. If saving image to SVG is possible, you should always prefer it to a PNG. Not only you can view SVG both as text and the image, but also the diff engine can tell you exactly what changed in the markup, instead of just showing a pixel difference.</p>
<p>But there are plenty of cases when SVG isn’t an option.</p>
<h3 id="htmlwidgets-when-youre-forced-back-to-pixels">htmlwidgets: when you’re forced back to pixels</h3>
<p>vdiffr can’t render an htmlwidget or a Shiny tag list, because there’s no static SVG to produce. Here you fall back to rendering the thing to a real PNG and comparing images. That’s a harder problem, and it’s where scoping and determinism stop being nice-to-haves.</p>
<p>The shape of the helper: render to HTML, screenshot to PNG with <code>webshot</code>, then hand the PNG to <code>expect_snapshot_file()</code>.</p>
<pre>expect_plot &lt;- function(x, name, ...) {
  UseMethod(&quot;expect_plot&quot;)
}

expect_plot.htmlwidget &lt;- function(
  x,
  name,
  variant = shinytest2::platform_variant(),
  width = 992,
  height = 744
) {
  local_edition(3)
  html_temp &lt;- fs::path(tempdir(), name, ext = &quot;html&quot;)
  png_temp &lt;- fs::path(tempdir(), name, ext = &quot;png&quot;)
  on.exit(unlink(tempdir()))

  htmlwidgets::saveWidget(
    x,
    file = html_temp,
    selfcontained = FALSE
  )
  webshot::webshot(
    url = html_temp,
    file = png_temp,
    delay = 0.5,
    quiet = TRUE,
    vwidth = width,
    vheight = height
  )

  testthat::expect_snapshot_file(
    png_temp,
    name = fs::path(name, ext = &quot;png&quot;),
    variant = variant
  )
}</pre>
<p>I’ve used this pattern across many projects and it always worked for me very well.</p>
<p><strong>The threshold.</strong> A pixel-exact comparison of a rendered chart will fail on trivial, invisible differences in anti-aliasing. So the <code>compare</code> function allows a small per-pixel difference budget before it calls a mismatch. Locally (<code>interactive()</code>) the threshold is <code>0</code>, because you want to see every change as you work. On CI it’s relaxed, because the CI renderer isn’t identical to your laptop.</p>
<p><strong>The variant.</strong> Fonts render differently across operating systems, so a snapshot recorded on macOS will not match one produced on Linux. <code>platform_variant()</code> keys the snapshot to the platform, and <code>expect_snapshot_file()</code> keeps a separate recorded file per variant (<code>_snaps/mac/...</code>, <code>_snaps/linux/...</code>), so cross-platform rendering differences never masquerade as a regression. This is the storage half of <a href="https://jakubsobolewski.com/blog/snapshot-testing-beyond-screenshots#4-stabilize-platform-differences" rel="nofollow" target="_blank">practice #4</a>; pair it with <code>snapshot_download_gh()</code> so CI generates the canonical files the whole team commits.</p>
<p>The test that uses it reads like any other, with a descriptive title and a named snapshot:</p>
<pre>it(&quot;colors timepoints between thresholds&quot;, {
  # Arrange
  data &lt;- data.frame(
    x = seq_date(3),
    y = c(10, 20, 30)
  )

  # Act
  plot &lt;- threshold_plot(data, lower = 15, upper = 25)

  # Assert
  expect_plot(
    plot,
    name = &quot;colors_timepoints_between_thresholds&quot;
  )
})</pre>
<h3 id="interactive-plots-you-can-even-snapshot-an-interaction">Interactive plots: you can even snapshot an interaction</h3>
<p>An htmlwidget is just HTML and JavaScript. That means you can drive it into a specific state and snapshot <em>that</em>. Here’s an <code>echarts4r</code> line chart where the behavior under test is the tooltip content.</p>
<p>A small helper dispatches the chart’s own “show tooltip” action when the widget loads, it simulates user interacting with the plot:</p>
<pre>#' tests/testthat/setup-trigger_tooltip.R
trigger_tooltip &lt;- function(x, series_index, data_index) {
  htmlwidgets::onRender(
    x,
    sprintf(
      &quot;function(el, x, data) {
        const chart = echarts.getInstanceByDom(el);
        chart.dispatchAction({
          type: 'showTip',
          seriesIndex: %s,
          dataIndex: %s,
        });
      }&quot;,
      series_index,
      data_index
    )
  )
}</pre>
<hr>
<p>Then the test renders the chart, triggers the tooltip, and snapshots the result (notice that the # Act is triggering the tooltip, not creating the plot):</p>
<pre>it(&quot;shows tooltip content from the specified tooltip column&quot;, {
  # Arrange
  data &lt;- data.frame(
    date = as.Date(
      c(&quot;2020-01-01&quot;, &quot;2020-02-01&quot;, &quot;2020-03-01&quot;)
    ),
    value = c(1, 3, 2),
    tooltip = &quot;TOOLTIP&quot;
  )
  plot &lt;- line_plot(
    data,
    date = &quot;date&quot;,
    value = &quot;value&quot;,
    tooltip = &quot;tooltip&quot;
  )

  # Act
  result &lt;- plot %&gt;%
    trigger_tooltip(series_index = 0, data_index = 1)

  # Assert
  expect_plot(result, &quot;line_plot_tooltip&quot;)
})</pre>
<p>The tooltip text is doing double duty. It’s the data the test checks, and it’s a human-readable marker that tells the reviewer exactly what to look for when accepting the snapshot. Here’s the PNG that snapshot captures:</p>
<p><img alt="echarts line chart with the tooltip ‘TOOLTIP’ shown" loading="lazy" decoding="async" fetchpriority="auto" width="450" src="https://jakubsobolewski.com/_astro/line_plot_tooltip.Bye5hnIf_ZKTGFd.webp" ></p>
<h2 id="example-2-printed-output-reporters-loggers-cli">Example 2: printed output (reporters, loggers, CLI)</h2>
<p>Some objects exist to print.</p>
<p>Test reporters, loggers, CLI tools. Their whole job is to render formatted text to the console.</p>
<p>That text <em>is</em> the behavior, and <code>expect_snapshot()</code> captures it verbatim into a readable <code>.md</code> file.</p>
<p>I use this in <a href="https://github.com/jakubsob/muttest" rel="nofollow" target="_blank">muttest</a>, a mutation testing package, to pin down exactly what the progress reporter prints. The test is small:</p>
<pre>test_that(&quot;progress reporter shows all killed&quot;, {
  .with_example_dir(&quot;shipping/&quot;, {
    mutators &lt;- list(operator(&quot;&gt;&quot;, &quot;&lt;&quot;))
    plan &lt;- muttest_plan(mutators, fs::dir_ls(&quot;R&quot;))
    .expect_snapshot(
      muttest(
        plan,
        reporter = ProgressMutationReporter$new(
          min_time = Inf,
          survived_detail = &quot;none&quot;
        )
      )
    )
  })
})</pre>
<p>But a reporter’s output is full of nondeterminism: spinner frames, blank lines, and per-step timings like <code>[0.3s]</code> and <code>Duration: 1.2s</code>. Snapshot that raw and it fails on every run. The fix is <a href="https://jakubsobolewski.com/blog/snapshot-testing-beyond-screenshots#3-remove-nondeterminism-or-filter-whats-left" rel="nofollow" target="_blank">practice #3</a>: a <code>transform</code> that strips the noise before comparison. I wrap <code>expect_snapshot()</code> once, in <code>setup.R</code>, so every test in the suite gets the cleaning for free:</p>
<pre>.expect_snapshot &lt;- purrr::partial(
  testthat::expect_snapshot,
  transform = function(lines) {
    lines |&gt;
      stringr::str_subset(&quot;^[\\|/\\-\\\\] \\|&quot;, negate = TRUE) |&gt;
      stringr::str_subset(&quot;^$&quot;, negate = TRUE) |&gt;
      stringr::str_remove_all(&quot;\\s\\[\\d+.\\d+s\\]&quot;) |&gt;
      stringr::str_remove_all(&quot;Duration:\\s\\d+.\\d+\\ss&quot;) |&gt;
      stringr::str_trim()
  }
)</pre>
<p>The first two <code>str_subset</code> calls drop spinner lines and blanks. The two <code>str_remove_all</code> calls delete the timing fragments. What’s left is the stable, meaningful part of the output, and that’s what lands in the snapshot:</p>
<pre># progress reporter shows all killed

    Code
      muttest(plan, reporter = ProgressMutationReporter$new(min_time = Inf,
        survived_detail = &quot;none&quot;))
    Output
      i Mutation Testing
        |   K |   S |   E |   T |   % | Mutator  | File
      v |   1 |   0 |   0 |   1 | 100 | &gt; → &lt;    | shipping.R
      -- Results ---------------------------------------------------------------------
      [ KILLED 1 | SURVIVED 0 | ERRORS 0 | TOTAL 1 | SCORE 100.0% ]</pre>
<p>This is a snapshot doing exactly what it should. The table alignment, the symbols, the score line. None of that is pleasant to assert by hand, but all of it is obviously correct (or obviously wrong) at a glance. The full reporter source, setup, and recorded snapshots are in the muttest repo: <a href="https://github.com/jakubsob/muttest/blob/main/tests/testthat/setup.R" rel="nofollow" target="_blank">setup.R</a>, <a href="https://github.com/jakubsob/muttest/blob/main/tests/testthat/test-test_reporter-progress.R" rel="nofollow" target="_blank">the test</a>, and <a href="https://github.com/jakubsob/muttest/blob/main/tests/testthat/_snaps/test_reporter-progress.md" rel="nofollow" target="_blank">the snapshot</a>.</p>
<h2 id="example-3-data-frames-as-csv">Example 3: data frames as CSV</h2>
<p>Data frames are a classic snapshot candidate, and a classic way to get it wrong.</p>
<p>The tempting move is <code>expect_snapshot(print(df))</code>. Don’t. Printed data frames are truncated past a certain size, formatted to your console width, and shown in whatever order the rows happen to be in. You’re snapshotting the print method, not the data.</p>
<p>Write the data frame to CSV. CSV is text, it diffs cleanly, and it’s the obvious human-readable representation of tabular data.</p>
<p><strong>I find snapshotting tables especially useful when you need signoff of business logic calculation from a business expert.</strong> Then instead of showing a table created in code you can print hand over the CSV or even a formatted markdown table for review. The expert can then sign off on the calculation without needing to read the code.</p>
<p>Following the same S3 pattern as <code>expect_plot</code>, here’s a custom expectation. The comparison is the part worth getting right, so it lives in its own named function:</p>
<pre>compare_df &lt;- function(old, new) {
  # Compare parsed data frames, not raw CSV text
  # Notice you can use custom comparison functions here
  isTRUE(all.equal(
    read.csv(old, stringsAsFactors = FALSE),
    read.csv(new, stringsAsFactors = FALSE)
  ))
}
expect_snap &lt;- function(x, name, ...) {
  UseMethod(&quot;expect_snap&quot;)
}

expect_snap.data.frame &lt;- function(x, name, ...) {
  local_edition(3)

  # Practice #3: deterministic order so row shuffling never breaks the test
  x &lt;- x[do.call(order, x), , drop = FALSE]
  rownames(x) &lt;- NULL

  path &lt;- fs::path(tempdir(), name, ext = &quot;csv&quot;)
  on.exit(unlink(path))

  expect_snapshot_file(
    path = local({
      write.csv(x, path, row.names = FALSE)
      path
    }),
    name = fs::path(name, ext = &quot;csv&quot;),
    compare = compare_df
  )
}</pre>
<p>Two choices make this robust. Sorting on all columns before writing (that’s <a href="https://jakubsobolewski.com/blog/snapshot-testing-beyond-screenshots#3-remove-nondeterminism-or-filter-whats-left" rel="nofollow" target="_blank">practice #3</a>) makes the snapshot order-independent. And <code>compare_df</code> reads both files back into data frames and compares <em>those</em>, not the raw text, so a trailing newline, a quoting difference, or an integer written as <code>1</code> versus <code>1.0</code> never fails the test.</p>
<p>That second claim is the one worth verifying, and <code>compare_df</code> is an ordinary function, so it gets an ordinary unit test. Two files holding the same data but different CSV text must compare equal; a genuine change must not:</p>
<pre>test_that(&quot;compare_df ignores CSV formatting but catches value changes&quot;, {
  # Arrange
  recorded    &lt;- tempfile(fileext = &quot;.csv&quot;)
  reformatted &lt;- tempfile(fileext = &quot;.csv&quot;)
  changed     &lt;- tempfile(fileext = &quot;.csv&quot;)
  writeLines(c(&quot;x,y&quot;,     &quot;1,10.5&quot;, &quot;2,20.1&quot;),     recorded)
  # quoted, trailing newline
  writeLines(c('&quot;x&quot;,&quot;y&quot;', &quot;1,10.5&quot;, &quot;2,20.1&quot;, &quot;&quot;), reformatted)
  # a real change
  writeLines(c(&quot;x,y&quot;,     &quot;1,10.5&quot;, &quot;2,99.9&quot;),     changed)

  # Act &#038; Assert
  expect_true(compare_df(recorded, reformatted))
  expect_false(compare_df(recorded, changed))
})
Test passed with 2 successes &#x1f600;.</pre>
<p>The test passes: formatting noise doesn’t fail the comparison, a changed value does. You snapshot to a readable format, but you compare on meaning.</p>
<h2 id="example-4-errors-and-conditions">Example 4: errors and conditions</h2>
<p>User-facing messages are a contract. When a function fails, the error text is part of its behavior, and <code>expect_snapshot()</code> pins it.</p>
<pre>test_that(&quot;withdrawing more than the balance reports the shortfall&quot;, {
  expect_snapshot(
    withdraw(account(balance = 50), amount = 80),
    error = TRUE
  )
})</pre>
<p>The <code>error = TRUE</code> tells testthat the code is expected to throw and to capture the condition instead of failing the test. The message goes into the snapshot:</p>
<pre># withdrawing more than the balance reports the shortfall

    Code
      withdraw(account(balance = 50), amount = 80)
    Condition
      Error in `withdraw()`:
      ! Cannot withdraw 80 from an account with balance 50.
      i Available to withdraw: 50.</pre>
<p>Now if someone changes that message (softens it, drops the available balance, mangles the formatting), the test fails and shows the diff. The same works for warnings and messages. It’s the cleanest way to keep error messages from silently degrading. (The flip side: a message worth snapshotting is a message worth writing carefully. See the <a href="https://jakubsobolewski.com/blog/test-smells-in-r/" rel="nofollow" target="_blank">Mystery Guest and Overspecification smells</a> for the failure modes nearby.)</p>
<h2 id="example-5-nested-data-structures">Example 5: nested data structures</h2>
<p>Some outputs are big nested lists: a parsed config, an API response, a model object’s metadata. Asserting them field by field is miserable:</p>
<pre>expect_equal(result$user$name, &quot;Ada&quot;)
expect_equal(result$user$roles, c(&quot;admin&quot;, &quot;editor&quot;))
expect_equal(result$settings$theme, &quot;dark&quot;)
expect_equal(result$settings$notifications$email, TRUE)
# ... twenty more lines</pre>
<p>Each line is a place to make a typo, and together they still might not cover every field. Snapshot the whole structure once, review it once.</p>
<p>The only real decision is the serialization format, and <a href="https://jakubsobolewski.com/blog/snapshot-testing-beyond-screenshots#2-make-snapshots-human-readable" rel="nofollow" target="_blank">practice #2</a> decides it. Don’t use <code>dput()</code>; its output is valid R but painful to read. Serialize to pretty JSON or YAML, which a human can actually scan:</p>
<pre>result &lt;- list(
  user = list(name = &quot;Ada&quot;, roles = c(&quot;admin&quot;, &quot;editor&quot;)),
  settings = list(
    theme = &quot;dark&quot;,
    notifications = list(email = TRUE, sms = FALSE)
  )
)

cat(jsonlite::toJSON(result, pretty = TRUE, auto_unbox = TRUE))
{
  &quot;user&quot;: {
    &quot;name&quot;: &quot;Ada&quot;,
    &quot;roles&quot;: [&quot;admin&quot;, &quot;editor&quot;]
  },
  &quot;settings&quot;: {
    &quot;theme&quot;: &quot;dark&quot;,
    &quot;notifications&quot;: {
      &quot;email&quot;: true,
      &quot;sms&quot;: false
    }
  }
}</pre>
<p>Wrap that in a snapshot expectation and the recorded file is a clean, indented JSON document. One review covers the entire structure, and any change to any field shows up as a precise diff.</p>
<p>Use it sparingly, not every big output needs a snapshot test, sometimes it’s better to assert on the shape and values that actually matter.</p>
<h2 id="the-responsibility">The responsibility</h2>
<p>Here is where snapshot testing lives or dies.</p>
<p><strong>The first snapshot is a decision, not a fact.</strong> When testthat records a new snapshot, the test passes. That green check does not mean the output is correct. It means the output now exists. The only thing that makes it correct is <em>you reading it and deciding it is.</em> Accept a snapshot without reading it and you’ve written a test that asserts “the code does whatever it currently does,” which is no test at all.</p>
<p>When a snapshot changes, testthat tells you and gives you tools to review:</p>
<pre># opens a diff app for changed snapshots
testthat::snapshot_review()
# accept changes once you've reviewed them
testthat::snapshot_accept()</pre>
<p><code>snapshot_review()</code> is the honest path: it shows you old versus new and makes you look. <code>snapshot_accept()</code> without looking is how snapshot suites become worthless. The reason <a href="https://jakubsobolewski.com/blog/snapshot-testing-beyond-screenshots#2-make-snapshots-human-readable" rel="nofollow" target="_blank">practice #2</a> (human-readable) matters so much is that it’s what makes this review <em>possible</em>. A binary blob can’t be reviewed, so you’d rubber-stamp it by necessity.</p>
<p>And snapshots are your dependencies. They’re checked into version control and they show up in pull requests. A changed snapshot in a diff deserves the same scrutiny as a changed function, often more, because it’s the line where “the behavior changed” becomes visible.</p>
<h2 id="build-your-own-snapshot-expectations">Build your own snapshot expectations</h2>
<p>Notice what <code>expect_plot</code>, <code>expect_snap.data.frame</code>, and the <code>transform</code>-wrapped <code>.expect_snapshot</code> have in common. Each one is a <strong>domain-specific expectation</strong> that bakes the four practices into a reusable function:</p>
<ul>
<li><code>expect_plot</code> handles scoping, the difference threshold, and platform variants.</li>
<li><code>expect_snap.data.frame</code> handles deterministic ordering and meaning-based comparison.</li>
<li><code>.expect_snapshot</code> handles filtering nondeterministic console output.</li>
</ul>
<p>You decide <em>once</em> how a given kind of output should be captured, made readable, and made deterministic. Then every test that uses the expectation gets it right for free. That’s the real payoff. You can use <code>expect_snapshot()</code> as is, but you can also tailor available <code>testthat</code> functions to better fit your needs and make them more reusable, more expressive. The expectations you build on top of it are where snapshot testing becomes a tool your whole suite can lean on.</p>
<h2 id="cheat-sheet">Cheat sheet</h2>













































<table><thead><tr><th>Output type</th><th>Capture as</th><th>How to keep it deterministic</th></tr></thead><tbody><tr><td>ggplot</td><td>SVG (<code>vdiffr</code>)</td><td>Fixed data; vdiffr handles rendering</td></tr><tr><td>htmlwidget / Shiny</td><td>PNG (<code>webshot</code>)</td><td>Difference threshold + platform variant</td></tr><tr><td>Interactive widget</td><td>PNG of a state</td><td>Drive a deterministic action, then shot</td></tr><tr><td>Console / reporter</td><td><code>.md</code> text</td><td><code>transform</code> to strip timing/spinners</td></tr><tr><td>Data frame</td><td>CSV</td><td>Sort rows; compare parsed frames</td></tr><tr><td>Error / warning</td><td><code>.md</code> text</td><td><code>error = TRUE</code>; message is already fixed</td></tr><tr><td>Nested structure</td><td>pretty JSON/YAML</td><td>Stable key order from the serializer</td></tr></tbody></table>
<p>The technique is the same everywhere: capture, save, compare. What changes is the format and how you tame the noise. Keep snapshots <strong>scoped</strong> so failures mean something, <strong>readable</strong> so you can review them, <strong>deterministic</strong> so they don’t flake, and <strong>well-named</strong> so they stand on their own.</p>
<p>Do that, and snapshot testing covers far more ground than the screenshots it’s famous for.</p>
<h2 id="apply-this">Apply this</h2>
<p>Reading about the practices is easy. Applying them to a real suite is the work. The fastest way to internalize them is to point an AI agent at your own snapshot tests, have it find where they drift from the five practices, then fix the worst few yourself; that’s how you learn.</p>
<p>Open your test files in your AI coding agent (Claude Code, Cursor, Copilot Chat) and paste this prompt:</p>
<pre>You are a senior R engineer reviewing a test suite's use of snapshot testing (testthat 3rd edition: expect_snapshot / expect_snapshot_file, plus vdiffr for plots).

Scope: audit the test file(s) I've shared AND the production code they exercise. Snapshot quality usually can't be judged from the test alone — you need to see what's being captured and where any nondeterminism comes from.

Judge every snapshot against five practices, and flag where each is violated:

1. Scoped — the snapshot captures exactly the behavior under test, nothing more. A test for one chart that snapshots a whole dashboard (header, sidebar, &quot;last refreshed&quot; timestamp) fails on unrelated changes. Fix: capture the smallest artifact that proves the behavior.
2. Human-readable — the snapshot is text a reviewer can read in a diff: .md, .csv, .svg, pretty JSON/YAML. Flag binary or .rds snapshots; they can't be reviewed, so they get rubber-stamped. Fix: serialize to a text format.
3. Deterministic — the snapshot is identical run to run. Flag captured timestamps, random IDs, elapsed-time/durations, unordered query results, locale-dependent formatting. Fix at the source first (inject a fixed clock / seed / IDs — dependency injection); if you can't, filter with the `transform` argument, or sort rows before serializing.
4. Platform-stable — rendered (image) snapshots use a `variant` keyed on OS / R version / key dependency version, and image comparison allows a tolerance instead of pixel-exact equality. Flag a single _snaps file shared across platforms, or a zero-threshold pixel compare on CI. Mention the CI-as-source-of-truth workflow (testthat::snapshot_download_gh) where relevant.
5. Well-named — the test title states the precondition and the expected output (not &quot;&lt;fn&gt; works&quot;), and expect_snapshot_file snapshots get an explicit, descriptive name. The filename and title should explain the artifact without opening the test.

Also check suitability and reuse:
- Wrong tool: a snapshot of a single scalar or boolean should be expect_equal(); a 40-line field-by-field expect_equal() on a big nested object could be a snapshot. Flag both directions.
- Missing abstraction: if the same capture / clean / serialize logic repeats across tests, recommend extracting a domain-specific expectation (e.g. expect_plot(), expect_snap.data.frame()) that bakes the practices in once.

Rules:
- Only flag clear instances. Don't invent issues to look thorough.
- Quote the offending lines and cite file:line for every finding.
- Never change what the production code does — only its testability. If a fix needs dependency injection (a signature change), say so and describe the new interface.

Output:
1. A triage table: practice violated | file:line | severity (high/med/low) | one-line why.
2. Then fix the highest-severity findings as before/after code blocks — the smallest change that removes the problem. Stop after three and ask before continuing if more remain.
3. Tell me how to re-run just these tests, and which snapshots I'll need to review and accept.</pre>
<p>Before you accept a snapshot, run it past this checklist:</p>
<ul class="contains-task-list">
<li class="task-list-item"><input type="checkbox" disabled> You read the recorded output and decided it’s correct — you didn’t just accept the green check.</li>
<li class="task-list-item"><input type="checkbox" disabled> The snapshot captures only the behavior under test, so a failure means something.</li>
<li class="task-list-item"><input type="checkbox" disabled> It’s stored in human-readable format, not binary, so you can read and diff it in a pull request.</li>
<li class="task-list-item"><input type="checkbox" disabled> Nothing in it changes run to run — no timestamps, random IDs, or incidental ordering.</li>
<li class="task-list-item"><input type="checkbox" disabled> The test title and snapshot filename state the behavior on their own.</li>
</ul>
<p>Want to turn these habits into a path you can work through for your whole suite? The <a href="https://jakubsobolewski.com/get-roadmap" rel="nofollow" target="_blank">R testing roadmap</a> lays out the steps.</p>
<h2 id="references">References</h2>
<ol>
<li>testthat — <a href="https://testthat.r-lib.org/articles/snapshotting.html" rel="nofollow" target="_blank">Snapshot tests</a></li>
<li>vdiffr — <a href="https://vdiffr.r-lib.org/" rel="nofollow" target="_blank">Visual regression testing for ggplot2</a></li>
<li>muttest — <a href="https://github.com/jakubsob/muttest/blob/main/tests/testthat/test-test_reporter-progress.R" rel="nofollow" target="_blank">progress reporter snapshot tests</a></li>
<li><a href="https://jakubsobolewski.com/blog/test-smells-in-r/" rel="nofollow" target="_blank">11 Test Smells That Make Your Tests Lie to You</a></li>
</ol>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://jakubsobolewski.com/blog/snapshot-testing-beyond-screenshots"> Jakub Sobolewski</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/snapshot-testing-in-r-beyond-screenshots/">Snapshot Testing in R: Beyond Screenshots</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402125</post-id>	</item>
		<item>
		<title>TheseusPlot 0.3.0: Visualizing the Decomposition of Differences in Rate Metrics</title>
		<link>https://www.r-bloggers.com/2026/06/theseusplot-0-3-0-visualizing-the-decomposition-of-differences-in-rate-metrics/</link>
		
		<dc:creator><![CDATA[Koji Makiyama]]></dc:creator>
		<pubDate>Wed, 17 Jun 2026 13:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://hoxo-m.github.io/blog/posts/TheseusPlot-0-3-0/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>TheseusPlot is an R package that decomposes differences in a rate metric between two groups into subgroup-level contributions and visualizes the results as a “Theseus Plot”.<br />
For example, when a click-through rate, conversion rate, or retention r...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/theseusplot-0-3-0-visualizing-the-decomposition-of-differences-in-rate-metrics/">TheseusPlot 0.3.0: Visualizing the Decomposition of Differences in Rate Metrics</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://hoxo-m.github.io/blog/posts/TheseusPlot-0-3-0/"> HOXO-M Blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<p><strong>TheseusPlot</strong> is an R package that decomposes differences in a rate metric between two groups into subgroup-level contributions and visualizes the results as a “Theseus Plot”.</p>
<p>For example, when a click-through rate, conversion rate, or retention rate differs between two time periods or groups, TheseusPlot helps answer questions such as: which subgroup contributed most to the difference?</p>
<p>Suppose that the click-through rate (CTR) was 6.2% in 2024 and 5.2% in 2025, a decrease of 1.0 percentage point. A Theseus Plot can show how this decrease is decomposed: in this example, 0.8 percentage points are assigned to male users and 0.2 percentage points to female users under the decomposition.</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i1.wp.com/hoxo-m.github.io/blog/posts/TheseusPlot-0-3-0/figures/example-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>Version 0.3.0 is now available on CRAN. This release fixes a compatibility issue with waterfalls 1.1.4, improves subgroup size bar rendering, and refines several plot defaults.</p>
<section id="whats-new-in-0.3.0" class="level2">
<h2 class="anchored" data-anchor-id="whats-new-in-0.3.0">What’s new in 0.3.0</h2>
<section id="cleaner-plot-labels" class="level3">
<h3 class="anchored" data-anchor-id="cleaner-plot-labels">Cleaner plot labels</h3>
<p>In earlier versions, TheseusPlot automatically displayed the analyzed column name as a subtitle. However, this was not always useful, especially when the plot was already used in a document or presentation where the context was clear.</p>
<p>In version 0.3.0, the automatic column-name subtitle has been removed. This makes the resulting plots cleaner and easier to combine with custom titles, captions, and surrounding text.</p>
<p>This release also adds an <code>xlab</code> argument to <code>create_ship()</code>, so you can customize the x-axis label used by <code>plot()</code> and <code>plot_flip()</code>.</p>
<p>For example:</p>
<div class="cell">
<pre>ship &lt;- create_ship(
  data_2024,
  data_2025,
  y = clicked,
  labels = c(&quot;2024&quot;, &quot;2025&quot;),
  xlab = &quot;Gender&quot;,
  ylab = &quot;CTR (%)&quot;
)

ship$plot(gender)</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i2.wp.com/hoxo-m.github.io/blog/posts/TheseusPlot-0-3-0/figures/xlab-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>This is useful when the column name in the data is short or technical, but you want a more readable label in the plot.</p>
</section>
<section id="better-default-labels" class="level3">
<h3 class="anchored" data-anchor-id="better-default-labels">Better default labels</h3>
<p>The default group labels have been changed from <code>&quot;Original&quot;</code> and <code>&quot;Refitted&quot;</code> to <code>&quot;Baseline&quot;</code> and <code>&quot;Comparison&quot;</code>.</p>
<div class="cell">
<pre>ship &lt;- create_ship(
  data_2024,
  data_2025,
  y = clicked
)

ship$plot(gender)</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i2.wp.com/hoxo-m.github.io/blog/posts/TheseusPlot-0-3-0/figures/no-labels-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
<p>The previous labels reflected the internal idea of replacing one group with another, but they were not always intuitive for users. The new defaults better match common comparison scenarios, such as year-over-year comparisons, control versus treatment, and before-and-after analyses.</p>
<p>Of course, you can still specify your own labels:</p>
<div class="cell">
<pre>ship &lt;- create_ship(
  data_Nov,
  data_Dec,
  y = on_time,
  labels = c(&quot;November&quot;, &quot;December&quot;)
)</pre>
</div>
</section>
<section id="simpler-numeric-display" class="level3">
<h3 class="anchored" data-anchor-id="simpler-numeric-display">Simpler numeric display</h3>
<p>The default number of displayed decimal places has been changed from 3 to 1.</p>
<p>In many plots, three decimal places made the labels more detailed than necessary. Since TheseusPlot is mainly intended to help users understand the structure of a metric difference, one decimal place is often enough for visual interpretation.</p>
<p>You can still control the precision with the <code>digits</code> argument when needed.</p>
<div class="cell">
<pre>ship &lt;- create_ship(
  data_2024,
  data_2025,
  y = clicked,
  labels = c(&quot;2024&quot;, &quot;2025&quot;),
  digits = 2
)

ship$plot(gender)</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i0.wp.com/hoxo-m.github.io/blog/posts/TheseusPlot-0-3-0/figures/digits-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
</section>
</section>
<section id="plot-improvements-and-bug-fixes" class="level2">
<h2 class="anchored" data-anchor-id="plot-improvements-and-bug-fixes">Plot improvements and bug fixes</h2>
<p>Version 0.3.0 also includes several improvements and bug fixes related to plot rendering.</p>
<p>First, missing subgroup size bars in <code>plot()</code> and <code>plot_flip()</code> with waterfalls 1.1.4 have been fixed. Subgroup size bars are an important part of Theseus Plots because they show the sample size of each subgroup in both groups. Without them, it becomes harder to judge whether a large contribution comes from a large subgroup, a large rate difference, or both.</p>
<p>Second, subgroup size bar scaling has been improved. Bar heights are now computed consistently from the maximum plot score in both <code>plot()</code> and <code>plot_flip()</code>. This makes visual comparisons more stable across plot directions. The maximum height of these bars can still be controlled with the <code>bar_max_value</code> argument.</p>
<p>Third, <code>text_size</code> handling has been fixed when applying the current ggplot2 theme. This makes text scaling more predictable when users customize plot themes.</p>
<div class="cell">
<pre>ship &lt;- create_ship(
  data_2024,
  data_2025,
  y = clicked,
  labels = c(&quot;2024&quot;, &quot;2025&quot;),
  text_size = 1.5
)

ship$plot(gender)</pre>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i2.wp.com/hoxo-m.github.io/blog/posts/TheseusPlot-0-3-0/figures/text_size-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="installation" class="level2">
<h2 class="anchored" data-anchor-id="installation">Installation</h2>
<p>You can install TheseusPlot from CRAN with:</p>
<pre>install.packages(&quot;TheseusPlot&quot;)</pre>
</section>
<section id="try-it-out" class="level2">
<h2 class="anchored" data-anchor-id="try-it-out">Try it out</h2>
<p>TheseusPlot is useful when you want to understand why rate metrics differ between two groups.</p>
<p>Typical examples include:</p>
<ul>
<li>click-through rate</li>
<li>conversion rate</li>
<li>retention rate</li>
<li>success rate</li>
<li>error rate</li>
</ul>
<p>For details, please see the package website:</p>
<ul>
<li><a href="https://hoxo-m.github.io/TheseusPlot/" rel="nofollow" target="_blank">https://hoxo-m.github.io/TheseusPlot/</a></li>
</ul>


</section>

 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://hoxo-m.github.io/blog/posts/TheseusPlot-0-3-0/"> HOXO-M Blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/theseusplot-0-3-0-visualizing-the-decomposition-of-differences-in-rate-metrics/">TheseusPlot 0.3.0: Visualizing the Decomposition of Differences in Rate Metrics</a>]]></content:encoded>
					
		
		<enclosure url="https://hoxo-m.github.io/blog/posts/TheseusPlot-0-3-0/figures/example-1.png" length="0" type="image/png" />

		<post-id xmlns="com-wordpress:feed-additions:1">402085</post-id>	</item>
		<item>
		<title>How I Used One-Way ANOVA in R to Analyze Crop Yield Data for a PhD Student (Real Case Study)</title>
		<link>https://www.r-bloggers.com/2026/06/how-i-used-one-way-anova-in-r-to-analyze-crop-yield-data-for-a-phd-student-real-case-study/</link>
		
		<dc:creator><![CDATA[Unknown]]></dc:creator>
		<pubDate>Wed, 17 Jun 2026 10:12:31 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://www.r-bloggers.com/?guid=41ca7943e3b1e71e5114c5fe09327503</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> My client's supervisor had rejected Chapter 4 twice. Not because the data was bad — the field trial was clean, the yield measurements precise. The problem was the statistics. And until I looked at the file, the student had no idea what was actually wro...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/how-i-used-one-way-anova-in-r-to-analyze-crop-yield-data-for-a-phd-student-real-case-study/">How I Used One-Way ANOVA in R to Analyze Crop Yield Data for a PhD Student (Real Case Study)</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.rstudiodatalab.com/2026/06/how-i-used-anova-in-r-crop-yield-phd-thesis.html"> RStudioDataLab</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p style="text-align: justify;"><span style="font-family: inherit;">My client&#8217;s supervisor had rejected Chapter 4 twice. Not because the data was bad — the field trial was clean, the yield measurements precise. The problem was the statistics. And until I looked at the file, the student had no idea what was actually wrong.</span></p><p style="text-align: justify;"><span style="font-family: inherit;">This is the full story of how I ran one-way ANOVA in R to analyze wheat yield data from a three-treatment fertilizer trial, checked every assumption, wrote the APA results section, and delivered it in under 24 hours. I have helped over 500 researchers through this exact kind of problem. Here is what the process actually looks like.</span></p><p style="text-align: left;"></p><div class="separator" style="clear: both; text-align: justify;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdkk8AXDwU-LGNctZsKJmu6gI2Y-ouBIae3CZkM_MHcFNlu3vdMR1UNNxjpj3M7kkEK11rRSOpD3uMBV-nMhzsNvKU6M_c3FH6R0xtkbgktSvzJtea7M7hv8LAH0hYFHlJ-7Z8ONJyv46wXTK1N-YpYDTCuAudz92Pyx4FfSYq4fc9TWdC_60PRChN_G4/s1200/How%20I%20Used%20One-Way%20ANOVA%20in%20R%20to%20Analyze%20Crop%20Yield%20Data%20for%20a%20PhD%20Student%20(Real%20Case%20Study).jpg" style="margin-left: 1em; margin-right: 1em;" rel="nofollow" target="_blank"><img alt="How I Used One-Way ANOVA in R to Analyze Crop Yield Data for a PhD Student (Real Case Study)" border="0" data-original-height="630" data-original-width="450" src="https://i1.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdkk8AXDwU-LGNctZsKJmu6gI2Y-ouBIae3CZkM_MHcFNlu3vdMR1UNNxjpj3M7kkEK11rRSOpD3uMBV-nMhzsNvKU6M_c3FH6R0xtkbgktSvzJtea7M7hv8LAH0hYFHlJ-7Z8ONJyv46wXTK1N-YpYDTCuAudz92Pyx4FfSYq4fc9TWdC_60PRChN_G4/s16000/How%20I%20Used%20One-Way%20ANOVA%20in%20R%20to%20Analyze%20Crop%20Yield%20Data%20for%20a%20PhD%20Student%20(Real%20Case%20Study).jpg?resize=450%2C630&#038;ssl=1" title="How I Used One-Way ANOVA in R to Analyze Crop Yield Data for a PhD Student (Real Case Study)" data-recalc-dims="1" /></a></div><div style="text-align: justify;"><br /></div><span><div style="text-align: justify;"><br /></div></span><p></p><p style="text-align: justify;"><span style="font-family: inherit;">The dataset had three fertilizer treatments measured across three growing seasons at a UK agricultural research site. The student needed to know whether treatment type significantly affected crop yield — and which specific treatments differed from each other. That question calls for a<a href="https://www.rstudiodatalab.com/2023/06/ANOVA-Assumptions-Violated-Types-Methods-2023.html" rel="nofollow" target="_blank"> perform ANOVA</a>, followed by a post-hoc comparison.</span></p><div class="alert info" style="text-align: justify;"><span style="font-family: inherit;"><b>Info!</b><span id="docs-internal-guid-f37839e4-7fff-5dc7-37de-ae513729c2c7"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline; white-space: pre-wrap;">Sound familiar?</span><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> I can run this exact analysis on your data and deliver the full APA results section in 24 hours —</span><a href="https://wa.me/message/J6ELCCB6EW7YC1" style="text-decoration: none;" rel="nofollow" target="_blank"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> </span><span face="Arial, sans-serif" style="color: #1155cc; font-size: 11pt; font-variant: normal; text-decoration-skip-ink: none; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">WhatsApp me now →</span></a></span></span></div><span id="docs-internal-guid-929a3bad-7fff-0b2f-baea-1a4de4ab1ab9">
  <details class="sp toc" open="" style="font-family: inherit;"><summary data-hide="Hide all" data-show="Show all" style="text-align: justify;">Table of Contents</summary><div class="aToc"></div></details>
  
  <h2 dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 4pt; margin-top: 18pt; text-align: justify;"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 17pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">The Client&#8217;s Problem (Chapter 4 Rejected Twice)</span></h2><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">The student — a third-year PhD candidate at a leading UK university — was studying the effect of fertilizer type on wheat yield. She had collected three years of field trial data, cleaned it carefully, and handed it to her supervisor with a results section she had written herself.</span></p><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"><b>First rejection</b>. The supervisor&#8217;s comment: </span><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-style: italic; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">&#8220;You have used three separate t-tests to compare each pair of treatments. It inflates your Type I error rate. A single ANOVA is the correct method.&#8221;</span></p><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">She corrected it and resubmitted. <b>Second rejection</b>. This time: </span><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-style: italic; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">&#8220;You have run the ANOVA but not reported any assumption checks. I need to see normality and homogeneity of variance tests before I can accept these results.&#8221;</span></p><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">That is when she contacted me. She was eight months from her final submission date</span><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> and her statistical </span><a href="https://www.rstudiodatalab.com/2023/09/Exploratory-Factor-Analysis.html" style="text-decoration: none;" rel="nofollow" target="_blank"><span face="Arial, sans-serif" style="color: #1155cc; font-size: 11pt; font-variant: normal; text-decoration-skip-ink: none; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">analysis</span></a><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> — supposedly the most routine part of Chapter 4 — was stalling her progress.</span></p><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">The core problem was not unusual. Running three separate t-tests instead of a one-way ANOVA is one of the most common mistakes I see in thesis data analysis. Each individual t-test sets α = .05, but when you run three, the family-wise error rate climbs to roughly 14%. The supervisor was right to reject it. And skipping assumption checks is equally serious — reviewers and examiners expect to see that the model&#8217;s conditions were verified, not assumed.</span></p><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">I asked her to send the raw data file. Within 20 minutes of opening it, I had identified exactly what needed to be done: </span></p><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"></p><ul style="font-family: inherit;"><li style="text-align: justify;"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"><a href="https://www.rstudiodatalab.com/2024/10/shapiro-wilk-normality-test-shapirotest.html" rel="nofollow" target="_blank">Shapiro-Wilk normality test</a> on the residuals, </span></li><li style="text-align: justify;"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"><a href="https://www.rstudiodatalab.com/2025/02/levene-test-in-r-for-homogeneity-of.html" rel="nofollow" target="_blank">Levene&#8217;s test</a> for homogeneity of variance, </span></li><li style="text-align: justify;"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">the main <a href="https://www.rstudiodatalab.com/2023/06/ANOVA-Assumptions-Violated-Types-Methods-2023.html" rel="nofollow" target="_blank">one-way ANOVA</a> using </span><span style="color: #188038; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">aov()</span><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">, </span></li><li style="text-align: justify;"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"><a href="https://www.rstudiodatalab.com/2023/06/Tukey-HSD-test-Parametric-2023.html" rel="nofollow" target="_blank">Tukey HSD post-hoc</a> to identify which specific treatments differed. </span></li><li style="text-align: justify;"><span face="Arial, sans-serif" style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">Then a clean APA write-up of all four results.</span></li></ul><p style="font-family: inherit;"></p>
  
  <div class="pRelate" style="font-family: inherit;"><div style="text-align: justify;"><b style="font-family: inherit;">Before We start Make sure you Have:</b></div><ul>
  <li style="text-align: justify;"><a href="https://www.rstudiodatalab.com/2023/06/a-comprehensive-guide-to-rstudio.html" rel="nofollow" target="_blank">Comprehensive Guide: How to install RStudio</a></li>
  <li style="text-align: justify;"><a href="https://www.rstudiodatalab.com/2023/07/how-to-import-install-packages-r.html" rel="nofollow" target="_blank">How to Import and Install Packages in R: A Comprehensive Guide</a></li>
  <li style="text-align: justify;"><a href="https://www.rstudiodatalab.com/2024/09/how-to-import-data-into-r-load-data.html" rel="nofollow" target="_blank">How to Import Data into R | Load Data file in R Programming</a></li>
  <li style="text-align: justify;"><a href="https://www.rstudiodatalab.com/search/label/Data%20Preprocessing" rel="nofollow" target="_blank">Preprocess the data</a></li></ul></div>
  <div><span face="Arial, sans-serif" style="color: #0e101a; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-emoji: normal; font-variant-numeric: normal; font-variant-position: normal; vertical-align: baseline;"><span id="docs-internal-guid-b4233d6e-7fff-9b60-0c82-a8768306f15b"><h2 dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; line-height: 1.38; margin-bottom: 4pt; margin-top: 18pt; text-align: justify; white-space: pre-wrap;"><span style="font-size: 17pt; font-variant: normal; vertical-align: baseline;">The Dataset (Crop Yield, 3 Treatments × 3 Years)</span></h2><p dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify; white-space: pre-wrap;"><span style="font-size: 11pt; font-variant: normal; vertical-align: baseline;">The trial measured wheat yield in kilograms per hectare (kg/ha) across three fertilizer treatments applied to field plots in Yorkshire from 2021 to 2023. Each treatment had three replicated plots per season, giving nine observations per group and 27 total.</span></p><p dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify; white-space: pre-wrap;">  <span style="font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline;">Treatments:</span></p><ul style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; margin-bottom: 0px; margin-top: 0px; padding-inline-start: 48px; white-space: pre-wrap;"><li aria-level="1" dir="ltr" style="font-size: 11pt; font-variant: normal; list-style-type: disc; vertical-align: baseline; white-space: pre;"><p dir="ltr" role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 12pt; text-align: justify;"><span style="font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline; white-space: pre-wrap;">Control</span><span style="font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> — no fertilizer applied</span></p></li><li aria-level="1" dir="ltr" style="font-size: 11pt; font-variant: normal; list-style-type: disc; vertical-align: baseline; white-space: pre;"><p dir="ltr" role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"><span style="font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline; white-space: pre-wrap;">Organic</span><span style="font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> — composted farmyard manure (25 t/ha)</span></p></li><li aria-level="1" dir="ltr" style="font-size: 11pt; font-variant: normal; list-style-type: disc; vertical-align: baseline; white-space: pre;"><p dir="ltr" role="presentation" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 0pt; text-align: justify;"><span style="font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline; white-space: pre-wrap;">Chemical</span><span style="font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> — NPK granular fertilizer (120:60:60 kg/ha)</span></p></li></ul><p dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify; white-space: pre-wrap;"><span style="font-size: 11pt; font-variant: normal; vertical-align: baseline;">Before running any test, I summarised the</span><a href="https://www.rstudiodatalab.com/2023/06/RStudio-Documentation-Your-Essential-Guide-to-Descriptive-Statistics.html" style="text-decoration: none;" rel="nofollow" target="_blank"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline;"> </span><span style="color: #1155cc; font-size: 11pt; font-variant: normal; text-decoration-skip-ink: none; text-decoration: underline; vertical-align: baseline;">descriptive statistics</span></a><span style="font-size: 11pt; font-variant: normal; vertical-align: baseline;"> for each group:</span></p><div align="left" dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; margin-left: 0pt; white-space: pre-wrap;"><table style="border-collapse: collapse; border-color: currentcolor; border-image: initial; border-style: none; border-width: medium; border: none; text-align: justify;"><colgroup><col width="86"></col><col width="24"></col><col width="107"></col><col width="88"></col><col width="47"></col><col width="47"></col></colgroup><tbody><tr style="height: 26.5pt;"><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: center;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline; white-space: pre-wrap;">Treatment</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: center;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline; white-space: pre-wrap;">n</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: center;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline; white-space: pre-wrap;">Mean (kg/ha)</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: center;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline; white-space: pre-wrap;">SD (kg/ha)</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: center;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline; white-space: pre-wrap;">Min</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: center;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline; white-space: pre-wrap;">Max</span></p></td></tr><tr style="height: 26.5pt;"><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">Control</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">9</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">2900.00</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">321.10</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">2393</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">3339</span></p></td></tr><tr style="height: 26.5pt;"><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">Organic</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">9</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">3400.00</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">276.97</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">3036</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">3933</span></p></td></tr><tr style="height: 26.5pt;"><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;">
    <span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">Chemical</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">9</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">3900.00</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">353.52</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">3191</span></p></td><td style="overflow-wrap: break-word; overflow: hidden; padding: 5pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">4474</span></p></td></tr></tbody></table></div><p dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify; white-space: pre-wrap;"><span style="font-size: 11pt; font-variant: normal; vertical-align: baseline;">The 500 kg/ha step between each group looked substantively meaningful even before testing. The real question was whether that separation exceeded natural within-plot variability.</span></p><div style="font-family: inherit;"><h2 style="text-align: justify;"><span style="font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-emoji: normal; font-variant-numeric: normal; font-variant-position: normal; vertical-align: baseline; white-space-collapse: preserve;"><span style="font-size: large;">Step 1: Checking Normality (Shapiro-Wilk + Q-Q Plot)</span></span></h2><p style="text-align: left;"></p><div style="text-align: justify;"><span style="font-family: inherit; font-size: 14.6667px; white-space-collapse: preserve;">One-way ANOVA assumes that the residuals follow a normal distribution. I always run this check on the model residuals — not on raw group values — because that is what the assumption actually refers to.</span></div><span style="font-size: 14.6667px; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-emoji: normal; font-variant-numeric: normal; font-variant-position: normal; vertical-align: baseline; white-space-collapse: preserve;"><div style="text-align: justify;"><span style="font-family: inherit; font-size: 14.6667px;">I used the shapiro.test function from base R alongside a Q-Q plot.</span></div></span><p></p></div><h3 style="font-family: inherit; text-align: justify;"><span style="font-size: 14.6667px; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-emoji: normal; font-variant-numeric: normal; font-variant-position: normal; vertical-align: baseline; white-space-collapse: preserve;">Step 1: Load packages </span></h3>
<pre>library(car)      # leveneTest()library(ggplot2)  # Q-Q plottreatment &lt;- factor(rep(c(&quot;Control&quot;, &quot;Organic&quot;, &quot;Chemical&quot;), each = 9),                       levels = c(&quot;Control&quot;, &quot;Organic&quot;, &quot;Chemical&quot;))yield &lt;-c(  # Control: no fertilizer (kg/ha)  2985, 2973, 3339, 3282, 3021, 2467, 2393, 2849, 2791,  # Organic: composted farmyard manure (kg/ha)  3310, 3435, 3143, 3401, 3299, 3933, 3724, 3319, 3036,  # Chemical: NPK granular fertilizer (kg/ha)  3848, 3913, 3876, 4149, 3191, 3967, 4054, 3628, 4474)crop_data &lt;- data.frame(treatment, yield)

# Fit model first so residuals are available 
model &lt;- aov(yield ~ treatment, data = crop_data)
#  Shapiro-Wilk on residuals 
shapiro.test(residuals(model))

#  Q-Q plot 
qqnorm(residuals(model), main = &quot;Normal Q-Q Plot of Residuals&quot;)
qqline(residuals(model), col = &quot;red&quot;, lwd = 2)

</pre><span><div style="text-align: justify;"><br /></div><div class="separator" style="clear: both; font-family: inherit; text-align: justify;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgEZ41UZGqpOfo5eJNh5ziBeMK-_PNIA0u3gwDbgUC88LgxN8cVWnlyBn5RB5H6mvB_484NhrNjriTRPi23gCAASQMOZEm_D4acIPL2uC7ieeqOiejI4ZIVpdrNi4JqTcsaJaBimebU4HcHM2zryQc_7wJ5qoYzU6MAkMXja125tI1tmC-vU1mX_3TBbJw" style="margin-left: 1em; margin-right: 1em;" rel="nofollow" target="_blank"><img alt="Fit Anova model, Shapiro-wilk on residuals and Q-Q Plot working code" data-original-height="529" data-original-width="450" src="https://blogger.googleusercontent.com/img/a/AVvXsEgEZ41UZGqpOfo5eJNh5ziBeMK-_PNIA0u3gwDbgUC88LgxN8cVWnlyBn5RB5H6mvB_484NhrNjriTRPi23gCAASQMOZEm_D4acIPL2uC7ieeqOiejI4ZIVpdrNi4JqTcsaJaBimebU4HcHM2zryQc_7wJ5qoYzU6MAkMXja125tI1tmC-vU1mX_3TBbJw=s16000" title="Fit Anova model, Shapiro-wilk on residuals and Q-Q Plot working code" /></a></div><div class="separator" style="clear: both; font-family: inherit; text-align: justify;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhibmzfWHo7lAVkXeuQTsNaozyv41ZIxIqzwW9_vvXFlGwMfj--cpN_siAp0C6lfnaj6uQoJWP3n-kgXrcbXGM9xqOUFkKQ58QUW2QMcoutfzMTrUy5zKMKiyaqruijbI2XKuiJldMYzZvsCsS9AOS3K_zYCGr1810A8z4cmNx_6yDQN3NnoGiryQORUI0" style="margin-left: 1em; margin-right: 1em;" rel="nofollow" target="_blank"><img alt="Normal Q-Q Plot for residual in R" data-original-height="360" data-original-width="450" src="https://blogger.googleusercontent.com/img/a/AVvXsEhibmzfWHo7lAVkXeuQTsNaozyv41ZIxIqzwW9_vvXFlGwMfj--cpN_siAp0C6lfnaj6uQoJWP3n-kgXrcbXGM9xqOUFkKQ58QUW2QMcoutfzMTrUy5zKMKiyaqruijbI2XKuiJldMYzZvsCsS9AOS3K_zYCGr1810A8z4cmNx_6yDQN3NnoGiryQORUI0=s16000" title="Normal Q-Q Plot for residual in R" /></a></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h4 style="font-family: inherit; text-align: justify;">SPSS equivalent:</h4><pre>* After entering data in SPSS Data View:EXAMINE VARIABLES = yield BY treatment  /PLOT NPPLOT  /STATISTICS DESCRIPTIVES  /CINTERVAL 95  /MISSING LISTWISE  /NOTOTAL.</pre></span></span></span><p style="font-family: inherit; text-align: justify;"><span style="color: #0e101a;">The Output
	Shapiro-Wilk normality test

data:  residuals(model)
W = 0.97692, p-value = 0.787

The output shows W = 0.977 and p = .787. The Q-Q plot showed points tracking closely along the reference line with no systematic departures.
What It Means
A Shapiro-Wilk p-value above .05 means we retain the null hypothesis that the residuals are normally distributed. With p = .787, there is no evidence of non-normality. The normality assumption is satisfied, and we can proceed to Levene's test.
APA-formatted result: W(27) = 0.977, p = .787</span></p><h3 dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 4pt; margin-top: 14pt; text-align: justify;"><span style="background-color: transparent; color: #0e101a; font-size: 13pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">What It Means</span></h3><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="background-color: transparent; color: #0e101a; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">A Shapiro-Wilk p-value above .05 means we retain the </span><a href="https://www.rstudiodatalab.com/2023/06/hypothesis-testing-step-by-step-guide.html" style="text-decoration: none;" rel="nofollow" target="_blank"><span style="background-color: transparent; color: #1155cc; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration-skip-ink: none; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">null hypothesis</span></a><span style="background-color: transparent; color: #0e101a; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> that the residuals are normally distributed. With p = .787, there is no evidence of non-normality. The normality assumption is satisfied, and we can proceed to Levene's test.</span></p><p style="font-family: inherit; text-align: justify;"><span id="docs-internal-guid-95976b09-7fff-57c5-c4b9-3eb4e838cd0e"></span></p><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="background-color: transparent; color: #0e101a; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">APA-formatted result:</span><span style="background-color: transparent; color: #0e101a; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> W(27) = 0.977, p = .787</span></p></div><div><span style="font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-emoji: normal; font-variant-numeric: normal; font-variant-position: normal; vertical-align: baseline;"><h2 dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; line-height: 1.38; margin-bottom: 4pt; margin-top: 18pt; text-align: justify; white-space: pre-wrap;"><span style="color: #0e101a; font-size: 17pt; font-variant: normal; vertical-align: baseline;">Levene's Test (Homogeneity of Variance)</span></h2><p dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify; white-space: pre-wrap;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline;">ANOVA also requires that variance is roughly equal across groups — this is the homogeneity of variance assumption. I use Levene's test via the </span><span style="color: #188038; font-size: 11pt; font-variant: normal; vertical-align: baseline;">car</span><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline;"> package because it is more robust to for normality than Bartlett's test. A full guide to</span><a href="https://www.rstudiodatalab.com/2025/02/levene-test-in-r-for-homogeneity-of.html" style="text-decoration: none;" rel="nofollow" target="_blank"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline;"> </span><span style="color: #1155cc; font-size: 11pt; font-variant: normal; text-decoration-skip-ink: none; text-decoration: underline; vertical-align: baseline;">homogeneity of variances</span></a><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline;"> testing is available on this blog.</span></p><h3 dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; line-height: 1.38; margin-bottom: 4pt; margin-top: 14pt; text-align: justify; white-space: pre-wrap;"><span style="color: #0e101a; font-size: 13pt; font-variant: normal; vertical-align: baseline;">The Code</span></h3><pre># Levene's test (center = median, Brown-Forsythe variant) leveneTest(yield ~ treatment, data = crop_data, center = median)</pre><span style="font-size: medium;"><div style="text-align: justify;"><br /></div><div class="separator" style="clear: both; font-family: inherit; font-variant-caps: normal; font-variant-ligatures: normal; text-align: justify; white-space: normal;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEin6ZA5eYaF66HwBBku8LRkC8cFT-kILZHZbHPlQs2U10LGDShlWPUNui9NdB70GZ9Mr3gloTnPUO1o4O1CYTV1BuopJD5nBn-4HCsJLitdRz5Nxilf_RI7AavlfblmhvPnqU5aHRKOf4a2iUhBEAgRT7S8gG6qVw7Wj_lpE5eyouQOdALMkutcn1p7ylw" style="margin-left: 1em; margin-right: 1em;" rel="nofollow" target="_blank"><img alt="Levene's test (center = median, Brown-Forsythe variant)" data-original-height="100" data-original-width="450" src="https://blogger.googleusercontent.com/img/a/AVvXsEin6ZA5eYaF66HwBBku8LRkC8cFT-kILZHZbHPlQs2U10LGDShlWPUNui9NdB70GZ9Mr3gloTnPUO1o4O1CYTV1BuopJD5nBn-4HCsJLitdRz5Nxilf_RI7AavlfblmhvPnqU5aHRKOf4a2iUhBEAgRT7S8gG6qVw7Wj_lpE5eyouQOdALMkutcn1p7ylw=s16000" title="Levene's test (center = median, Brown-Forsythe variant)" /></a></div></span><p dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify; white-space: pre-wrap;"><span style="color: #0e101a; font-size: 11pt; font-weight: 700;">SPSS equivalent:</span></p><pre>* Levene's test is reported automatically within:
ONEWAY yield BY treatment
  /STATISTICS HOMOGENEITY.</pre></span><p style="font-family: inherit; text-align: left;"><span style="font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-emoji: normal; font-variant-numeric: normal; font-variant-position: normal; vertical-align: baseline;"><span style="color: #0e101a; text-align: justify; white-space: pre-wrap;">The output shows F(2, 24) = 0.12, p = .886. The group variances — Control SD = 321, Organic SD = 277, Chemical SD = 354 — are clearly within an acceptable range of each other.</span></span></p><span style="font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-emoji: normal; font-variant-numeric: normal; font-variant-position: normal; vertical-align: baseline;"><h3 dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; line-height: 1.38; margin-bottom: 4pt; margin-top: 14pt; text-align: justify; white-space: pre-wrap;"><span style="color: #0e101a; font-size: 13pt; font-variant: normal; vertical-align: baseline;">What It Means</span></h3><p dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify; white-space: pre-wrap;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline;">With p = .886, we retain the null hypothesis of equal variances across the three fertilizer groups. The homogeneity assumption holds. Both assumption checks are cleared; the ANOVA result is defensible.</span></p><p dir="ltr" style="font-family: inherit; font-size: 11pt; font-variant-caps: normal; font-variant-ligatures: normal; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify; white-space: pre-wrap;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline;">APA-formatted result:</span><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline;"> F(2, 24) = 0.12, p = .886</span></p><h2 dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 4pt; margin-top: 18pt; text-align: justify;"><span style="background-color: transparent; color: #0e101a; font-size: 17pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">One-Way ANOVA in R (aov() Full Code + Output)</span></h2><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="background-color: transparent; color: #0e101a; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">With both assumptions confirmed, I ran the one way ANOVA in R using </span><span style="background-color: transparent; color: #188038; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">aov()</span><span style="background-color: transparent; color: #0e101a; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. This is the function that fits the ANOVA model and partitions total variance into treatment variance (between groups) and residual variance (within groups). The F statistic is the ratio of those two quantities.</span></p><pre># One-way ANOVA model &lt;- aov(yield ~ treatment, data = crop_data)summary(model)# Effect size: eta-squared ss   &lt;- summary(model)[[1]][, &quot;Sum Sq&quot;]eta2 &lt;- ss[1] / sum(ss)        # SS_treatment / SS_totalcat(&quot;eta-squared =&quot;, round(eta2, 3), &quot;\n&quot;)# Visualise group means (box plot) ggplot(crop_data, aes(x = treatment, y = yield, fill = treatment)) +  geom_boxplot(alpha = 0.7) +  labs(title = &quot;Wheat Yield by Fertilizer Treatment&quot;,       x = &quot;Treatment&quot;, y = &quot;Yield (kg/ha)&quot;) +  theme_minimal() +  theme(legend.position = &quot;none&quot;)</pre><div style="text-align: justify;"><br /></div><h3 dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 4pt; margin-top: 14pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; white-space: pre-wrap;">SPSS equivalent:</span></h3><pre>ONEWAY yield BY treatment  /STATISTICS DESCRIPTIVES HOMOGENEITY EFFECTS  /POSTHOC TUKEY ALPHA(0.05)  /PLOT MEANS.</pre><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 13pt; font-weight: 700; white-space: pre-wrap;">The Output</span></p><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; white-space: pre-wrap;"></span></p><div class="separator" style="clear: both; font-family: inherit; text-align: justify;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhM9rOHCxSHqMEFP_HSlr2QMC5kKg5l8aj4B_Nz251BCjbK5jq8UdLspjmljOc-L4tnrdpMngOgO40E2UneGWyrTLz35NsL6uIEQKKEERJm9FKl52JTyrk-QYj3ZI2XCpm9oaBmgIHgDtWG1OoKirNdiaMz5qJwblOS19tqlbl67bgmFjnMhwkZ5i7S3VA" style="margin-left: 1em; margin-right: 1em;" rel="nofollow" target="_blank"><img alt="One way anova by using the rstudio" data-original-height="349" data-original-width="450" src="https://blogger.googleusercontent.com/img/a/AVvXsEhM9rOHCxSHqMEFP_HSlr2QMC5kKg5l8aj4B_Nz251BCjbK5jq8UdLspjmljOc-L4tnrdpMngOgO40E2UneGWyrTLz35NsL6uIEQKKEERJm9FKl52JTyrk-QYj3ZI2XCpm9oaBmgIHgDtWG1OoKirNdiaMz5qJwblOS19tqlbl67bgmFjnMhwkZ5i7S3VA=s16000" title="One way anova by using the rstudio" /></a></div><div class="separator" style="clear: both; font-family: inherit; text-align: justify;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgaubV7OaxcANgVMxtImZCedyjj_DwU-rZU9JJYN_J-v4cn-xyV0PVVMydzO98p5YRxIOjy4adPp7jUqOPyGe0FhvFhrm4iBxMTL4m33fUYfoQiiMuZ8Jjs1p3ECFHc5Pai0JQ_1xcWtIb9Iz9e_hzduSGukfoUvOoT1DysBbWExFRW7V8fUROqj90gfR4" style="margin-left: 1em; margin-right: 1em;" rel="nofollow" target="_blank"><img alt="Wheat Yield by Fertilizer Treatment" data-original-height="360" data-original-width="450" src="https://blogger.googleusercontent.com/img/a/AVvXsEgaubV7OaxcANgVMxtImZCedyjj_DwU-rZU9JJYN_J-v4cn-xyV0PVVMydzO98p5YRxIOjy4adPp7jUqOPyGe0FhvFhrm4iBxMTL4m33fUYfoQiiMuZ8Jjs1p3ECFHc5Pai0JQ_1xcWtIb9Iz9e_hzduSGukfoUvOoT1DysBbWExFRW7V8fUROqj90gfR4=s16000" title="Wheat Yield by Fertilizer Treatment" /></a></div><p style="font-family: inherit;"></p><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; white-space: pre-wrap;">The output shows that treatment accounts for SS = 4,500,000 with MS = 2,250,000. The residual MS (within-group variance) is 101,598. Dividing those gives F = 22.15.</span></p><h3 dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 4pt; margin-top: 14pt; text-align: justify;"><span style="background-color: transparent; color: #0e101a; font-size: 13pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">What It Means</span></h3><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="background-color: transparent; color: #0e101a; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The p-value (.000004) falls far below any conventional significance threshold. There is a statistically significant difference in mean wheat yield across the three fertilizer treatments. The η² = .65 means that fertilizer treatment alone explains 65% of all variance in yield — a large effect by any standard. This was the number the supervisor had been waiting to see justified properly.</span></p><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="background-color: transparent; color: #0e101a; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">APA-formatted result:</span><span style="background-color: transparent; color: #0e101a; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> F(2, 24) = 22.15, p = .000004, η² = .65, 95% CI [.35, .76]</span></p><div class="alert success" style="font-family: inherit;"><span><span style="color: #0e101a; font-size: 14.6667px; font-weight: 700; text-align: justify; white-space: pre-wrap;">Need these results written in APA format for your thesis? </span><span style="color: #0e101a; font-size: 11pt; font-variant: normal; text-align: justify; vertical-align: baseline; white-space: pre-wrap;">That is exactly what I deliver with every project. Visit my</span><a href="https://www.rstudiodatalab.com/p/thesis-data-analysis-service.html" style="text-align: justify; text-decoration: none;" rel="nofollow" target="_blank"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> </span><span style="color: #1155cc; font-size: 11pt; font-variant: normal; text-decoration-skip-ink: none; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">thesis data analysis service</span></a><span style="color: #0e101a; font-size: 11pt; font-variant: normal; text-align: justify; vertical-align: baseline; white-space: pre-wrap;"> to see what is included.</span></span></div><span id="docs-internal-guid-3f740b6c-7fff-2f2b-dba9-d97acb04f3ae" style="font-family: inherit;"><h2 dir="ltr" style="line-height: 1.38; margin-bottom: 4pt; margin-top: 18pt; text-align: justify;"><span style="color: #0e101a; font-size: 17pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">Tukey HSD Post-Hoc (Which Treatments Differed)</span></h2><p dir="ltr" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">A significant ANOVA </span><a href="https://www.rstudiodatalab.com/2024/06/how-to-do-f-test-in-r-compare-variances.html" style="text-decoration: none;" rel="nofollow" target="_blank"><span style="color: #1155cc; font-size: 11pt; font-variant: normal; text-decoration-skip-ink: none; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">F-test</span></a><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> only tells you that at least one group mean differs. It does not say which ones. For that, I ran a Tukey Honestly Significant Difference (HSD) post-hoc test — the appropriate choice when comparing all three pairs simultaneously, as it controls the family-wise error rate. For a full discussion of</span><a href="https://www.rstudiodatalab.com/2023/06/post-hoc-test-types-software-data.html" style="text-decoration: none;" rel="nofollow" target="_blank"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> </span><span style="color: #1155cc; font-size: 11pt; font-variant: normal; text-decoration-skip-ink: none; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">post-hoc test types</span></a><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> and when each applies, see the linked guide.</span></p><h3 dir="ltr" style="line-height: 1.38; margin-bottom: 4pt; margin-top: 14pt; text-align: justify;"><span style="color: #0e101a; font-size: 13pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">The Code</span></h3><pre>tukey_result &lt;- TukeyHSD(model, conf.level = 0.95)print(tukey_result)# ── Plot the confidence intervals ──────────────────────────────────────────plot(tukey_result, las = 1, col = &quot;steelblue&quot;)</pre></span></span><h3 style="font-family: inherit; text-align: left;"><span style="font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-emoji: normal; font-variant-numeric: normal; font-variant-position: normal; vertical-align: baseline;"><span><span style="color: #0e101a; font-family: inherit; font-size: 11pt; font-weight: 700; text-align: justify; white-space: pre-wrap;">SPSS equivalent:</span></span></span></h3><span style="font-family: inherit; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-emoji: normal; font-variant-numeric: normal; font-variant-position: normal; vertical-align: baseline;"><span><pre>* Already requested in the ONEWAY syntax above via /POSTHOC TUKEY.* Results appear in the &quot;Multiple Comparisons&quot; output table.</pre></span></span><h3 style="font-family: inherit; text-align: left;"><span style="font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-emoji: normal; font-variant-numeric: normal; font-variant-position: normal; vertical-align: baseline;"><span style="color: #0e101a; font-family: inherit; font-size: 13pt; text-align: justify; white-space: pre-wrap;">The Output</span></span></h3><span style="font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-emoji: normal; font-variant-numeric: normal; font-variant-position: normal; vertical-align: baseline;"><span><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-family: inherit; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"></span></p><div class="separator" style="clear: both; font-family: inherit; text-align: justify;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjW4NREPgN4nUbq5fPB2cu0RoKpk7OPjHO1xilpdKvnaG4AwAHUVN5lQMUJIj8YmWd0XkF_jNMcctEFtbvJLZrFW5wYQ4vwgri8Ls7-TMeWELcGgXUQ7FCLYscJDg6_wH6LuaIy-v-ZMRm-q5nRdH3DTCt8AsC4S54S_yiJKpnB7qZKJm8rFscAHYeyx8g" style="margin-left: 1em; margin-right: 1em;" rel="nofollow" target="_blank"><img alt="tukey test to compare the yield by treatment" data-original-height="265" data-original-width="450" src="https://blogger.googleusercontent.com/img/a/AVvXsEjW4NREPgN4nUbq5fPB2cu0RoKpk7OPjHO1xilpdKvnaG4AwAHUVN5lQMUJIj8YmWd0XkF_jNMcctEFtbvJLZrFW5wYQ4vwgri8Ls7-TMeWELcGgXUQ7FCLYscJDg6_wH6LuaIy-v-ZMRm-q5nRdH3DTCt8AsC4S54S_yiJKpnB7qZKJm8rFscAHYeyx8g=s16000" title="tukey test to compare the yield by treatment" /></a></div><div style="text-align: justify;"><br /></div><div class="separator" style="clear: both; font-family: inherit; text-align: justify;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEh92FBrN-vbZ1_VmH6S3FYWWOIrOBlnJ1-NALUf7db7al7HJwj3ruF5FM9-8OqXLzqqHyCTWJThpdtO8V3vU-haXWAC1MzXDA1gyafprJpDk8gFdMhWSAWsOlRN5NPcU7SA9c-YMFlqXQA-C3vZ2yEKGhgHXDsTrzYGytQCMoIhdPUhQgClsswzBYyZDmM" style="margin-left: 1em; margin-right: 1em;" rel="nofollow" target="_blank"><img alt="Difference in mean level by treatment in rstudio" data-original-height="360" data-original-width="450" src="https://blogger.googleusercontent.com/img/a/AVvXsEh92FBrN-vbZ1_VmH6S3FYWWOIrOBlnJ1-NALUf7db7al7HJwj3ruF5FM9-8OqXLzqqHyCTWJThpdtO8V3vU-haXWAC1MzXDA1gyafprJpDk8gFdMhWSAWsOlRN5NPcU7SA9c-YMFlqXQA-C3vZ2yEKGhgHXDsTrzYGytQCMoIhdPUhQgClsswzBYyZDmM=s16000" title="Difference in mean level by treatment in rstudio" /></a></div></span><p style="font-family: inherit;"></p><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-family: inherit; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">The output shows all three pairwise</span><a href="https://www.rstudiodatalab.com/2023/10/confidence-intervals-in-r.html" style="font-family: inherit; text-decoration: none;" rel="nofollow" target="_blank"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> </span><span style="color: #1155cc; font-size: 11pt; font-variant: normal; text-decoration-skip-ink: none; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">confidence intervals</span></a><span style="color: #0e101a; font-family: inherit; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> are entirely above zero, confirming that none of the treatment differences are compatible with a null effect.</span></p><h3 dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 4pt; margin-top: 14pt; text-align: justify;"><span style="color: #0e101a; font-size: 13pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">What It Means</span></h3><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">Every pairwise comparison is statistically significant. Organic fertilizer produced 500 kg/ha more yield than the control (p = .008). Chemical fertilizer produced 1,000 kg/ha more than the control (p = .000002). And chemical outperformed organic by another 500 kg/ha (p = .008). Each successive step in the treatment sequence produced a meaningful, detectable gain.</span></p><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; font-weight: 700; vertical-align: baseline; white-space: pre-wrap;">APA-formatted results:</span></p><ul style="font-family: inherit; margin-bottom: 0px; margin-top: 0px; padding-inline-start: 48px;"><li aria-level="1" dir="ltr" style="color: #0e101a; font-size: 11pt; font-variant: normal; list-style-type: disc; vertical-align: baseline; white-space: pre;"><p dir="ltr" role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 12pt; text-align: justify;"><span style="font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">Organic vs. Control: diff = 500.00 kg/ha, 95% CI [124.76, 875.24], p = .008</span></p></li><li aria-level="1" dir="ltr" style="color: #0e101a; font-size: 11pt; font-variant: normal; list-style-type: disc; vertical-align: baseline; white-space: pre;"><p dir="ltr" role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"><span style="font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">Chemical vs. Control: diff = 1000.00 kg/ha, 95% CI [624.76, 1375.24], p = .000002</span></p></li><li aria-level="1" dir="ltr" style="color: #0e101a; font-size: 11pt; font-variant: normal; list-style-type: disc; vertical-align: baseline; white-space: pre;"><p dir="ltr" role="presentation" style="line-height: 1.38; margin-bottom: 12pt; margin-top: 0pt; text-align: justify;"><span style="font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">Chemical vs. Organic: diff = 500.00 kg/ha, 95% CI [124.76, 875.24], p = .008</span></p></li></ul><p dir="ltr" style="font-family: inherit; line-height: 1.38; margin-bottom: 12pt; margin-top: 12pt; text-align: justify;"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">The</span><a href="https://www.rstudiodatalab.com/2023/06/Tukey-HSD-test-Parametric-2023.html" style="text-decoration: none;" rel="nofollow" target="_blank"><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> </span><span style="color: #1155cc; font-size: 11pt; font-variant: normal; text-decoration-skip-ink: none; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">Tukey HSD test</span></a><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> is implemented in base R via </span><span style="color: #188038; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">TukeyHSD()</span><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> and requires no additional packages after </span><span style="color: #188038; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;">aov()</span><span style="color: #0e101a; font-size: 11pt; font-variant: normal; vertical-align: baseline; white-space: pre-wrap;"> has been fitted.</span></p><div style="font-family: inherit;">
  
  <h2 style="text-align: justify;">ANOVA Results Interpretation: How I Wrote the APA Results Section</h2><p></p><p style="text-align: justify;">This is where most students get stuck — not the test itself but translating the output into the precise language a supervisor or examiner expects. I wrote the following paragraph for her Chapter 4 and she submitted it verbatim after minor phrasing adjustments.</p><p style="text-align: justify;"><i>A one-way ANOVA was conducted to examine the effect of fertilizer treatment (Control, Organic, Chemical) on wheat yield (kg/ha). Prior to analysis, residuals were assessed for normality using the Shapiro-Wilk test, W(27) = 0.977, p = .787, and homogeneity of variance was confirmed via Levene's test, F(2, 24) = 0.12, p = .886. Both assumptions were satisfied. The ANOVA revealed a statistically significant effect of treatment on yield, F(2, 24) = 22.15, p = .000004, η² = .65, 95% CI [.35, .76], indicating a large effect. Post-hoc comparisons using the Tukey HSD procedure showed that Chemical fertilizer produced significantly greater yield than Control (diff = 1000.00 kg/ha, 95% CI [624.76, 1375.24], p = .000002) and than Organic (diff = 500.00 kg/ha, 95% CI [124.76, 875.24], p = .008). Organic treatment also significantly outperformed Control (diff = 500.00 kg/ha, 95% CI [124.76, 875.24], p = .008). Mean yields were M = 2900.00 (SD = 321.10), M = 3400.00 (SD = 276.97), and M = 3900.00 (SD = 353.52) kg/ha for Control, Organic, and Chemical treatments, respectively.</i></p><p style="text-align: justify;"><b>Notice the structure</b>: </p><p></p><ul><li style="text-align: justify;">State the test, </li><li style="text-align: justify;">Report the assumption checks with exact statistics, give the main F result with all required elements, </li><li style="text-align: justify;">Then report each post-hoc comparison on its own with its 95% CI. </li></ul><div style="text-align: justify;"><span style="font-family: inherit;">Supervisors who know APA format check for every one of those components. Missing the </span><a href="https://www.rstudiodatalab.com/2023/08/p-value-less-than-0.05.html" style="font-family: inherit;" rel="nofollow" target="_blank">p-value</a><span style="font-family: inherit;"> interpretation detail, omitting effect size, or writing "p < .05" instead of the exact value — any of these will draw a comment.</span></div><p></p><p style="text-align: justify;">The write-up also deliberately leads with the <a href="https://www.rstudiodatalab.com/2023/07/Correlation-Assumptions-Types-Example.html" rel="nofollow" target="_blank">assumption check </a>results, not the ANOVA result. That ordering signals to the examiner that you know what must be verified before the main test can be trusted.</p><p class="note"></p><div style="text-align: justify;"><b style="font-family: inherit;">Complete Analysis!</b></div><div style="text-align: justify;"><span style="font-family: inherit;">I delivered the complete analysis — assumption checks, ANOVA output, Tukey post-hoc, APA write-up, and annotated R script — within 22 hours of receiving the data.</span></div><p></p><p></p><p></p><h2 style="text-align: justify;">The Outcome (Supervisor's Response)</h2><p style="text-align: justify;">The student submitted the revised Chapter 4 three days later. Her supervisor approved the statistical section without further comment and cleared her to move forward with Chapter 5. She messaged me the following week to say the thesis had been submitted to the internal examiner on schedule.</p><p style="text-align: justify;">The turnaround from a second rejection to supervisor approval: four days. The statistical work itself: 22 hours from data receipt to delivery.</p><p></p><ul><li style="text-align: justify;">What changed was <b>not the data </b>— none of it had been re-collected or altered. </li><li style="text-align: justify;">What changed was the <b>analysis structure</b>: correct test for the design, documented assumption checks, exact APA formatting throughout, and a post-hoc comparison method that actually controls the error rate when making multiple comparisons.</li></ul><p></p><p style="text-align: justify;">Her supervisor's specific feedback on the revised section: "<i>Statistical analysis is now reported correctly and in full.</i>" That is the language that closes a Chapter 4.</p><h2 style="text-align: justify;">Need the Same Done for Your Data?</h2><p style="text-align: justify;">If your supervisor has flagged your statistical analysis, or you're not confident your current approach is defensible, here is what I deliver:</p><p></p><ol class="steps"><li style="text-align: justify;">Complete analysis in R, SPSS, or Minitab — your choice of software</li><li style="text-align: justify;">APA-formatted results section written and ready to paste into your thesis</li><li style="text-align: justify;">Publication-quality figures — box plots, Q-Q plots, means plots with error bars</li><li style="text-align: justify;">Unlimited revisions until your supervisor approves</li></ol><p></p><h3 style="text-align: justify;">Turnaround times:</h3><p></p><ol><li style="text-align: justify;">Thesis chapter (like the one above): 24–48 hours</li><li style="text-align: justify;">Individual assignment or coursework: 6–12 hours</li><li style="text-align: justify;">Pricing from: $25 (assignments) · $150 (full thesis chapter)</li><li style="text-align: justify;">Primary action: WhatsApp me now — free consultation →</li><li style="text-align: justify;">Also available on: Fiverr · Upwork</li></ol><p></p><p style="text-align: justify;">Or visit the service page: <a href="https://www.rstudiodatalab.com/p/thesis-data-analysis-service.html" rel="nofollow" target="_blank">thesis data analysis service</a></p><p style="text-align: justify;">I have worked with PhD and Master's students from institutions across the UK, Pakistan, Australia, and the US. My rating is 4.9/5 across 500+ completed projects. If your Chapter 4 has been rejected — or if you want to submit it knowing it will not be — send me your data and let me show you what the analysis should look like.</p><h2 style="text-align: justify;">Frequently Asked Question</h2>
  <h3 style="text-align: justify;"><span style="font-family: Lora, serif; font-size: 17.5px;">How do I run a one-way ANOVA in R?</span></h3><p style="text-align: justify;">Use the aov() function from base R: model <- aov(yield ~ treatment, data = your_data), then inspect the result with summary(model). Before trusting the output, check normality of residuals with shapiro.test(residuals(model)) and homogeneity of variance with leveneTest() from the car package.</p><h3 style="text-align: justify;">How do I report one-way ANOVA results in APA format?</h3><p style="text-align: justify;">APA format requires: F(df_between, df_within) = value, p = exact_value, eta-squared = value, 95% CI [lower, upper]. For example: F(2, 24) = 22.15, p = .000004, eta-squared = .65, 95% CI [.35, .76]. Always report the exact p-value — never write p < .05. Also report Tukey post-hoc comparisons with pairwise differences, confidence intervals, and adjusted p-values.</p><h3 style="text-align: justify;">What post-hoc test should I use after a significant one-way ANOVA in R?</h3><p style="text-align: justify;">Use Tukey HSD (Honestly Significant Difference) when comparing all possible group pairs, as it controls the family-wise error rate. In R, run TukeyHSD(model, conf.level = 0.95) after fitting the aov() model. Tukey HSD is the most widely accepted post-hoc test for balanced ANOVA designs and is the method most PhD supervisors and journal reviewers expect.</p><h3 style="text-align: justify;">Should I check normality before running ANOVA in R?</h3><p style="text-align: justify;">Yes. Run the Shapiro-Wilk test on the model residuals using shapiro.test(residuals(model)) — not on the raw data values. A p-value above .05 indicates normality is satisfied. Supplement with a Q-Q plot using qqnorm() and qqline(). Also run Levene's test for homogeneity of variance using leveneTest() from the car package.</p><h3 style="text-align: justify;">What is eta-squared and how do I calculate it in R?</h3><p style="text-align: justify;">Eta-squared (eta²) measures effect size for ANOVA — the proportion of total variance explained by the treatment factor. Calculate it in R as: ss <- summary(model)[[1]][, 'Sum Sq']; eta2 <- ss[1] / sum(ss). Values of .01, .06, and .14 are typically interpreted as small, medium, and large effects. Always report eta² with its 95%.</p><h3 style="text-align: justify;">Why was my Chapter 4 ANOVA rejected by my supervisor?</h3><p style="text-align: justify;">The two most common reasons are: (1) using multiple t-tests instead of a single ANOVA, which inflates the Type I error rate; and (2) running the ANOVA without first reporting assumption checks (Shapiro-Wilk normality test and Levene's homogeneity test). A third frequent issue is reporting 'p < .05' instead of the exact p-value, which does not meet APA standards. Each of these will typically prompt a supervisor rejection.</p>
  

<hr style="text-align: justify;" /><p class="note" style="text-align: justify;"><span style="font-family: inherit;"><span data-preserver-spaces="true" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; color: #0e101a; margin-bottom: 0pt; margin-top: 0pt;">Transform your raw data into actionable insights. Let my expertise in R and advanced data analysis techniques unlock the power of your information. Get a personalized consultation and see how I can streamline your projects, saving you time and driving better decision-making. Contact me today at contact@rstudiodatalab.com or </span><a class="editor-rtfLink" href="https://www.rstudiodatalab.com/p/order-now.html" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; color: #4a6ee0; margin-bottom: 0pt; margin-top: 0pt;" rel="nofollow" target="_blank"><span data-preserver-spaces="true" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; margin-bottom: 0pt; margin-top: 0pt;">visit</span></a><span data-preserver-spaces="true" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; color: #0e101a; margin-bottom: 0pt; margin-top: 0pt;"> to schedule your discovery call.</span></span></p>
<div style="text-align: justify;"><span style="font-family: inherit;"><a class="button" href="https://www.rstudiodatalab.com/p/join-our-community.html" rel="nofollow" target="_blank"><i class="icon demo"></i>Join Our Community</a> 
<a class="button" href="https://www.rstudiodatalab.com/p/order-now.html" rel="nofollow" target="_blank">
  <svg class="line" style="margin-right: 12px; stroke: rgb(255, 255, 255);" viewbox="0 0 24 24"><g transform="translate(2.000000, 2.500000)"><path d="M0.7501,0.7499 L2.8301,1.1099 L3.7931,12.5829 C3.8701,13.5199 4.6531,14.2389 5.5931,14.2359094 L16.5021,14.2359094 C17.3991,14.2379 18.1601,13.5779 18.2871,12.6899 L19.2361,6.1319 C19.3421,5.3989 18.8331,4.7189 18.1011,4.6129 C18.0371,4.6039 3.1641,4.5989 3.1641,4.5989"></path><line x1="12.1251" x2="14.8981" y1="8.2948" y2="8.2948"></line><path d="M5.1544,17.7025 C5.4554,17.7025 5.6984,17.9465 5.6984,18.2465 C5.6984,18.5475 5.4554,18.7915 5.1544,18.7915 C4.8534,18.7915 4.6104,18.5475 4.6104,18.2465 C4.6104,17.9465 4.8534,17.7025 5.1544,17.7025 Z"></path><path d="M16.4347,17.7025 C16.7357,17.7025 16.9797,17.9465 16.9797,18.2465 C16.9797,18.5475 16.7357,18.7915 16.4347,18.7915 C16.1337,18.7915 15.8907,18.5475 15.8907,18.2465 C15.8907,17.9465 16.1337,17.7025 16.4347,17.7025 Z"></path></g></svg>
  <span>Book a free call</span></a></span></div>



<script type="application/ld+json">
 {
 "@context": "http://schema.org",
 "@type": "Article",
 "@id": "<data:post.url/>#post-body-<data:post.id/>",
 "mainEntityOfPage": "<data:post.url/>",
 "headline": "<data:post.title/>",
 "name": "<data:post.title/>",
 "url": "<data:post.url/>",
 "description": "<data:blog.metaDescription/>",
 "image": "<data:post.featuredImage/>",
 "datePublished": "<data:post.date.iso8601/>",
 "dateModified": "<data:post.date.iso8601/>",
 "author": {
  "@type": "Person",
  "name": "<data:post.author.name/>",
  "url": "<data:blog.homepageUrl.jsonEscaped/>"
 },
 "publisher": {
  "@type": "Organization",
  "name": "<data:blog.homepageUrl.jsonEscaped/>",
  "description": "Unlock the secrets of data analysis with our comprehensive RStudio tutorials. From mastering the basics to tackling complex challenges, our blog provides the tools and knowledge you need to take your data analysis skills to the next level.",
  "logo": {
   "@type": "ImageObject",
   "url": "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhP_jQ9kbWVkjZ1T-5_osDo_JBuq2RAOB4_9Z726e3GPurZSUICYi5U_70kzDHQXZXzkgvhskpoXgTPeaolBDTZpz0qouYLOB8k5ge142uh5cIyJpVLYNvJ17V1wwNVxWKfX5LWq_WvU7nKpSTPvSGxgOQOSbJuXZEo1ylOsD7WJcIuTtx41Ofwo4cjwo0/s500/RStudioDataLab%20500%20x500.png",
   "width": 500,
   "height": 500
  }
 }
}
</script>

<script type="application/ld+json">
{
  "@context": "https://schema.org/",
  "@type": "Product",
  "name": "<data:post.title/>",
  "image": "<data:post.featuredImage/>",
  "description": "Welcome to RStudioDataLab, your go-to resource for mastering data analysis techniques in R. Dive into advanced topics such as logistic regression with categorical variables in R and explore p-values less than 0.05 for statistical significance. Understand the EFA vs CFA debate, create a correlation heatmap in R, and get insights into LDA r. Learn ridge regression and lasso regression in R.",
  "brand": "RStudioDatalab",
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.8",
    "bestRating": "5",
    "worstRating": "1",
    "ratingCount": "5000"
  }
}
</script>


<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "LocalBusiness",
  "name": "RStudiodatalab",
  "image": "https://blogger.googleusercontent.com/img/a/AVvXsEigqr2QHzVWYRP6N89q6Bu2cNjCvN7g8P5pWqQVmHfLWVUb2nrXfp7Qo64bmJN9M9rD8brW5SpBcLUTsAiT70iC0JCz1FXGNgN0GuylxoHsV18t19GD-s_tieNOwa36_bQ3vU9UN8X7GeGJD3SGQfSnDko4OV_cogw2fbliLPZgjPAOSwOpGI_Z9C3B8_DU=w150-h150-p-k-no-nu-rw-e90",
  "@id": "https://www.rstudiodatalab.com/",
  "url": "https://www.rstudiodatalab.com/",
  "telephone": "+923106367532",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "Near Chaze up",
    "addressLocality": "Multan",
    "postalCode": "60700",
    "addressCountry": "PK",
    "addressRegion": "PU"
  },
  "priceRange": "50",
  "sameAs": [
    "https://www.facebook.com/RStudioDataLab",
    "https://www.instagram.com/rstudiodatalab/",
    "https://twitter.com/rstudiodatalab",
    "https://youtube.com/@rstudiodatalab",
    "https://www.linkedin.com/company/rstudiodatalabs",
    "https://www.tiktok.com/@rstudiodatalab",
    "https://whatsapp.com/channel/0029VaBzfy80G0XbCXhGGA16"
  ],
  "openingHoursSpecification": {
    "@type": "OpeningHoursSpecification",
    "dayOfWeek": [
      "Monday",
      "Tuesday",
      "Wednesday",
      "Thursday",
      "Friday",
      "Saturday",
      "Sunday"
    ],
    "opens": "00:00",
    "closes": "23:59"
  }
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Rstudiodatalab",
  "url": "https://www.rstudiodatalab.com/",
  "logo": "https://blogger.googleusercontent.com/img/a/AVvXsEigqr2QHzVWYRP6N89q6Bu2cNjCvN7g8P5pWqQVmHfLWVUb2nrXfp7Qo64bmJN9M9rD8brW5SpBcLUTsAiT70iC0JCz1FXGNgN0GuylxoHsV18t19GD-s_tieNOwa36_bQ3vU9UN8X7GeGJD3SGQfSnDko4OV_cogw2fbliLPZgjPAOSwOpGI_Z9C3B8_DU=w150-h150-p-k-no-nu-rw-e90",
  "alternateName": "Rstudiodatalab",
  "sameAs": [
    "https://www.facebook.com/RStudioDataLab",
    "https://www.instagram.com/rstudiodatalab/",
    "https://twitter.com/rstudiodatalab",
    "https://youtube.com/@rstudiodatalab",
    "https://www.linkedin.com/company/rstudiodatalabs",
    "https://www.tiktok.com/@rstudiodatalab",
    "https://whatsapp.com/channel/0029VaBzfy80G0XbCXhGGA16"
  ]
}
</script>
  
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://www.rstudiodatalab.com/2026/06/how-i-used-anova-in-r-crop-yield-phd-thesis.html"
  },
  "headline": "How I Used One-Way ANOVA in R to Analyze Crop Yield Data for a PhD Student (Real Case Study)",
  "description": "Real case study: How I ran one-way ANOVA in R for a PhD student's agricultural thesis. Full code, Tukey HSD post-hoc, APA results, and how the supervisor approved it.",
  "image": {
    "@type": "ImageObject",
    "url": "https://www.rstudiodatalab.com/[PATH-TO-FEATURED-IMAGE].png",
    "width": 1200,
    "height": 630,
    "caption": "One-way ANOVA in R — crop yield analysis for PhD thesis (alt: one way anova in r crop yield agricultural trial)"
  },
  "author": {
    "@type": "Person",
    "name": "Dr. Zubair Goraya",
    "jobTitle": "Statistical Data Analyst",
    "description": "Expert in R, SPSS, and Minitab statistical analysis for agricultural and biological research. 500+ researchers helped. 4.9/5 rating.",
    "url": "https://www.rstudiodatalab.com",
    "sameAs": [
      "https://www.fiverr.com/drzubair11010"
    ]
  },
  "publisher": {
    "@type": "Organization",
    "name": "RStudio Data Lab",
    "url": "https://www.rstudiodatalab.com",
    "logo": {
      "@type": "ImageObject",
      "url": "https://blogger.googleusercontent.com/img/a/AVvXsEigqr2QHzVWYRP6N89q6Bu2cNjCvN7g8P5pWqQVmHfLWVUb2nrXfp7Qo64bmJN9M9rD8brW5SpBcLUTsAiT70iC0JCz1FXGNgN0GuylxoHsV18t19GD-s_tieNOwa36_bQ3vU9UN8X7GeGJD3SGQfSnDko4OV_cogw2fbliLPZgjPAOSwOpGI_Z9C3B8_DU=w150-h150-p-k-no-nu-rw-e90",
      "width": 512,
      "height": 512
    }
  },
  "datePublished": "2026-06-17",
  "dateModified": "2026-06-17",
  "url": "https://www.rstudiodatalab.com/2026/06/how-i-used-anova-in-r-crop-yield-phd-thesis.html",
  "inLanguage": "en-US",
  "articleSection": "Statistical Analysis",
  "keywords": [
    "one way anova in r",
    "ANOVA results interpretation",
    "how to report ANOVA results APA",
    "tukey hsd test r",
    "thesis data analysis service",
    "aov() r",
    "TukeyHSD r",
    "p value ANOVA",
    "treatment comparison",
    "agricultural trial",
    "crop science",
    "F statistic",
    "eta squared effect size",
    "Shapiro-Wilk test",
    "Levene test",
    "homogeneity of variance"
  ],
  "wordCount": 3000,
  "about": [
    {
      "@type": "Thing",
      "name": "Analysis of Variance",
      "sameAs": "https://en.wikipedia.org/wiki/Analysis_of_variance"
    },
    {
      "@type": "Thing",
      "name": "R (programming language)",
      "sameAs": "https://en.wikipedia.org/wiki/R_(programming_language)"
    },
    {
      "@type": "Thing",
      "name": "Tukey's range test",
      "sameAs": "https://en.wikipedia.org/wiki/Tukey%27s_range_test"
    },
    {
      "@type": "Thing",
      "name": "Crop yield",
      "sameAs": "https://en.wikipedia.org/wiki/Crop_yield"
    }
  ],
  "mentions": [
    {
      "@type": "SoftwareApplication",
      "name": "R",
      "operatingSystem": "Windows, macOS, Linux",
      "applicationCategory": "Statistical Software"
    },
    {
      "@type": "SoftwareApplication",
      "name": "SPSS",
      "operatingSystem": "Windows, macOS",
      "applicationCategory": "Statistical Software"
    },
    {
      "@type": "SoftwareApplication",
      "name": "Minitab",
      "operatingSystem": "Windows",
      "applicationCategory": "Statistical Software"
    }
  ],
  "isPartOf": {
    "@type": "Blog",
    "name": "RStudio Data Lab",
    "url": "https://www.rstudiodatalab.com"
  }
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Run One-Way ANOVA in R for a PhD Thesis (With Assumption Checks and APA Write-Up)",
  "description": "Step-by-step guide to running one-way ANOVA in R: Shapiro-Wilk normality test, Levene's test, aov() ANOVA, TukeyHSD post-hoc, and APA-format results reporting.",
  "image": {
    "@type": "ImageObject",
    "url": "https://www.rstudiodatalab.com/[PATH-TO-FEATURED-IMAGE].png",
    "width": 1200,
    "height": 630
  },
  "estimatedCost": {
    "@type": "MonetaryAmount",
    "currency": "USD",
    "value": "0",
    "description": "Free if you run it yourself in R. From $150 if you need it done for you."
  },
  "tool": [
    {
      "@type": "HowToTool",
      "name": "R (base package — aov, shapiro.test, TukeyHSD)"
    },
    {
      "@type": "HowToTool",
      "name": "car package (leveneTest)"
    },
    {
      "@type": "HowToTool",
      "name": "ggplot2 (Q-Q plot visualization)"
    }
  ],
  "supply": [
    {
      "@type": "HowToSupply",
      "name": "Crop yield dataset — one numeric response column and one treatment factor column"
    }
  ],
  "totalTime": "PT2H",
  "step": [
    {
      "@type": "HowToStep",
      "position": 1,
      "name": "Check Normality with the Shapiro-Wilk Test",
      "text": "Fit the ANOVA model first with aov(), then pass the residuals to shapiro.test(). A p-value above .05 confirms residuals are normally distributed. Also produce a Q-Q plot with qqnorm() and qqline() to inspect the distribution visually.",
      "url": "https://www.rstudiodatalab.com/2026/06/how-i-used-anova-in-r-crop-yield-phd-thesis.html#step-1",
      "image": {
        "@type": "ImageObject",
        "url": "https://blogger.googleusercontent.com/img/a/AVvXsEgEZ41UZGqpOfo5eJNh5ziBeMK-_PNIA0u3gwDbgUC88LgxN8cVWnlyBn5RB5H6mvB_484NhrNjriTRPi23gCAASQMOZEm_D4acIPL2uC7ieeqOiejI4ZIVpdrNi4JqTcsaJaBimebU4HcHM2zryQc_7wJ5qoYzU6MAkMXja125tI1tmC-vU1mX_3TBbJw"
      }
    },
    {
      "@type": "HowToStep",
      "position": 2,
      "name": "Test Homogeneity of Variance with Levene's Test",
      "text": "Use leveneTest() from the car package with center = median (the Brown-Forsythe variant). A p-value above .05 means variances are equal across groups and the ANOVA assumption holds.",
      "url": "https://www.rstudiodatalab.com/2026/06/how-i-used-anova-in-r-crop-yield-phd-thesis.html#step-2",
      "image": {
        "@type": "ImageObject",
        "url": "https://blogger.googleusercontent.com/img/a/AVvXsEin6ZA5eYaF66HwBBku8LRkC8cFT-kILZHZbHPlQs2U10LGDShlWPUNui9NdB70GZ9Mr3gloTnPUO1o4O1CYTV1BuopJD5nBn-4HCsJLitdRz5Nxilf_RI7AavlfblmhvPnqU5aHRKOf4a2iUhBEAgRT7S8gG6qVw7Wj_lpE5eyouQOdALMkutcn1p7ylw"
      }
    },
    {
      "@type": "HowToStep",
      "position": 3,
      "name": "Run the One-Way ANOVA Using aov()",
      "text": "Fit the model with aov(yield ~ treatment, data = crop_data) and inspect the output with summary(). Record the F statistic, degrees of freedom, and p-value. Compute eta-squared manually as SS_treatment divided by SS_total.",
      "url": "https://www.rstudiodatalab.com/[YEAR]/[MONTH]/how-i-used-anova-in-r-crop-yield-phd-thesis.html#step-3",
      "image": {
        "@type": "ImageObject",
        "url": "https://www.rstudiodatalab.com/[PATH-TO-ANOVA-OUTPUT-IMAGE].png"
      }
    },
    {
      "@type": "HowToStep",
      "position": 4,
      "name": "Run Tukey HSD Post-Hoc to Identify Which Groups Differ",
      "text": "Pass the fitted aov model to TukeyHSD(model, conf.level = 0.95). The output gives pairwise mean differences, 95% confidence intervals, and Tukey-adjusted p-values for every group combination.",
      "url": "https://www.rstudiodatalab.com/2026/06/how-i-used-anova-in-r-crop-yield-phd-thesis.html#step-4",
      "image": {
        "@type": "ImageObject",
        "url": "https://blogger.googleusercontent.com/img/a/AVvXsEhM9rOHCxSHqMEFP_HSlr2QMC5kKg5l8aj4B_Nz251BCjbK5jq8UdLspjmljOc-L4tnrdpMngOgO40E2UneGWyrTLz35NsL6uIEQKKEERJm9FKl52JTyrk-QYj3ZI2XCpm9oaBmgIHgDtWG1OoKirNdiaMz5qJwblOS19tqlbl67bgmFjnMhwkZ5i7S3VA"
      }
    },
    {
      "@type": "HowToStep",
      "position": 5,
      "name": "Write the APA Results Section",
      "text": "Report: (1) the assumption check results with exact W and F statistics and p-values; (2) the main ANOVA result as F(df_between, df_within) = value, p = exact, eta-squared, and 95% CI; (3) each Tukey pairwise comparison with difference, 95% CI, and adjusted p-value. Never write p < .05 — always give the exact p-value.",
      "url": "https://blogger.googleusercontent.com/img/a/AVvXsEgaubV7OaxcANgVMxtImZCedyjj_DwU-rZU9JJYN_J-v4cn-xyV0PVVMydzO98p5YRxIOjy4adPp7jUqOPyGe0FhvFhrm4iBxMTL4m33fUYfoQiiMuZ8Jjs1p3ECFHc5Pai0JQ_1xcWtIb9Iz9e_hzduSGukfoUvOoT1DysBbWExFRW7V8fUROqj90gfR4#step-5"
    }
  ]
}
</script>

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "How do I run a one-way ANOVA in R?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Use the aov() function from base R: model <- aov(yield ~ treatment, data = your_data), then inspect the result with summary(model). Before trusting the output, check normality of residuals with shapiro.test(residuals(model)) and homogeneity of variance with leveneTest() from the car package."
      }
    },
    {
      "@type": "Question",
      "name": "How do I report one-way ANOVA results in APA format?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "APA format requires: F(df_between, df_within) = value, p = exact_value, eta-squared = value, 95% CI [lower, upper]. For example: F(2, 24) = 22.15, p = .000004, eta-squared = .65, 95% CI [.35, .76]. Always report the exact p-value — never write p < .05. Also report Tukey post-hoc comparisons with pairwise differences, confidence intervals, and adjusted p-values."
      }
    },
    {
      "@type": "Question",
      "name": "What post-hoc test should I use after a significant one-way ANOVA in R?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Use Tukey HSD (Honestly Significant Difference) when comparing all possible group pairs, as it controls the family-wise error rate. In R, run TukeyHSD(model, conf.level = 0.95) after fitting the aov() model. Tukey HSD is the most widely accepted post-hoc test for balanced ANOVA designs and is the method most PhD supervisors and journal reviewers expect."
      }
    },
    {
      "@type": "Question",
      "name": "Should I check normality before running ANOVA in R?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Yes. Run the Shapiro-Wilk test on the model residuals using shapiro.test(residuals(model)) — not on the raw data values. A p-value above .05 indicates normality is satisfied. Supplement with a Q-Q plot using qqnorm() and qqline(). Also run Levene's test for homogeneity of variance using leveneTest() from the car package."
      }
    },
    {
      "@type": "Question",
      "name": "What is eta-squared and how do I calculate it in R?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Eta-squared (eta²) measures effect size for ANOVA — the proportion of total variance explained by the treatment factor. Calculate it in R as: ss <- summary(model)[[1]][, 'Sum Sq']; eta2 <- ss[1] / sum(ss). Values of .01, .06, and .14 are typically interpreted as small, medium, and large effects. Always report eta² with its 95% confidence interval in your APA results."
      }
    },
    {
      "@type": "Question",
      "name": "Why was my Chapter 4 ANOVA rejected by my supervisor?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "The two most common reasons are: (1) using multiple t-tests instead of a single ANOVA, which inflates the Type I error rate; and (2) running the ANOVA without first reporting assumption checks (Shapiro-Wilk normality test and Levene's homogeneity test). A third frequent issue is reporting 'p < .05' instead of the exact p-value, which does not meet APA standards. Each of these will typically prompt a supervisor rejection."
      }
    }
  ]
}
</script>

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "Home",
      "item": "https://www.rstudiodatalab.com/"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "Statistical Analysis",
      "item": "https://www.rstudiodatalab.com/search/label/Statistical%20Analysis"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "ANOVA in R",
      "item": "https://www.rstudiodatalab.com/search/label/ANOVA"
    },
    {
      "@type": "ListItem",
      "position": 4,
      "name": "How I Used One-Way ANOVA in R to Analyze Crop Yield Data for a PhD Student",
      "item": "https://www.rstudiodatalab.com/[YEAR]/[MONTH]/how-i-used-anova-in-r-crop-yield-phd-thesis.html"
    }
  ]
}
</script>

</div></span></div></span>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.rstudiodatalab.com/2026/06/how-i-used-anova-in-r-crop-yield-phd-thesis.html"> RStudioDataLab</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/how-i-used-one-way-anova-in-r-to-analyze-crop-yield-data-for-a-phd-student-real-case-study/">How I Used One-Way ANOVA in R to Analyze Crop Yield Data for a PhD Student (Real Case Study)</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402072</post-id>	</item>
		<item>
		<title>Auditing LLM Trading: Bridging Theory and Market Reality with the GT table in R</title>
		<link>https://www.r-bloggers.com/2026/06/auditing-llm-trading-bridging-theory-and-market-reality-with-the-gt-table-in-r/</link>
		
		<dc:creator><![CDATA[Selcuk Disci]]></dc:creator>
		<pubDate>Wed, 17 Jun 2026 08:05:04 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://datageeek.com/?p=12211</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Introduction: The Laboratorial Illusion In quantitative finance, Large Language Model (LLM) multi-agent systems are frequently celebrated for their theoretical intelligence. Financial data scientists spend months refining prompt semantics, building complex reasoning frameworks, and engineering multi-turn debate loops between specialized agent nodes. On paper—and within simulated environments—these networks demonstrate ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/auditing-llm-trading-bridging-theory-and-market-reality-with-the-gt-table-in-r/">Auditing LLM Trading: Bridging Theory and Market Reality with the GT table in R</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://datageeek.com/2026/06/17/auditing-llm-trading-bridging-theory-and-market-reality-with-the-gt-table-in-r/"> DataGeeek</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<h2 class="wp-block-heading">Introduction: The Laboratorial Illusion</h2>



<p class="wp-block-paragraph">In quantitative finance, Large Language Model (LLM) multi-agent systems are frequently celebrated for their theoretical intelligence. Financial data scientists spend months refining prompt semantics, building complex reasoning frameworks, and engineering multi-turn debate loops between specialized agent nodes. On paper—and within simulated environments—these networks demonstrate flawless predictive capabilities, capturing theoretical alpha with pristine efficiency.</p>



<p class="wp-block-paragraph">However, this laboratorial success cloaks a fatal vulnerability exposed by <strong><em><a href="https://arxiv.org/abs/2606.08285" rel="nofollow" target="_blank">Yao &#038; Zheng (2026)</a></em></strong>: traditional backtests systematically ignore execution semantics and market microstructure realities.</p>



<p class="wp-block-paragraph">In AI-driven trading systems, the primary risk is no longer the raw quality of the agent’s alpha signal; it is the <strong>cognitive latency</strong> required to generate that signal. While classical high-frequency algorithms fight a war of microseconds, LLM multi-agent networks engage in multi-second internal debates. When this cognitive inertia is forced to execute within highly volatile regimes, it transforms directly into a silent alpha killer. Yao &#038; Zheng (2026) forces us to stop judging agent architectures by their abstract zekası, and start auditing them by the brutal financial reality of their execution timing.</p>



<p class="wp-block-paragraph">To dismantle this illusion, this article implements a validation framework in R designed to audit multi-agent trading decisions against empirical market constraints. Rather than viewing transaction costs as a passive post-trade deduction, our framework forces execution slippage directly into the core ranking layer of the portfolio generation process, as demonstrated in our finalized <strong>Targeted Reproducibility &#038; Execution Realism Matrix</strong> below:</p>



<p class="wp-block-paragraph">Let’s break down the code block by block to see exactly how this audit engine operates, starting with the core dependencies and temporal isolation logic.</p>



<h2 class="wp-block-heading">Part 2: Environment Setup &#038; The Auditing Interface</h2>



<p class="wp-block-paragraph">The first step of our script loads the required quantitative packages and defines our core auditing function.</p>


<pre>
library(tidyquant)
library(dplyr)
library(tibble)
library(purrr)
library(gt)

audit_execution_assumptions &lt;- function(ticker, action, trade_date, order_size, latency_seconds, base_fee_bps = 10, ideal_rank = NA, audited_rank = NA) {
</pre>


<h3 class="wp-block-heading">Deconstructing the Operational Parameters</h3>



<p class="wp-block-paragraph">To test how an LLM agent’s decisions survive real market microstructure, our <code>audit_execution_assumptions</code> function requires explicit operational parameters. Here is the practical quantitative intuition behind each input:</p>



<ul class="wp-block-list">
<li><strong><code>ticker</code>:</strong> The asset symbol being audited (e.g., <code>&quot;AMD&quot;</code>, <code>&quot;TSLA&quot;</code>). It tells the engine exactly which market pricing stream to fetch.</li>



<li><strong><code>action</code>:</strong> The order side generated by the multi-agent system—strictly <code>&quot;BUY&quot;</code> or <code>&quot;SELL&quot;</code>. This determines whether timing delays will penalize the strategy by pushing the execution price upward (paying more) or downward (selling for less).</li>



<li><strong><code>trade_date</code>:</strong> The exact calendar day of the intended trade (<code>&quot;YYYY-MM-DD&quot;</code>). This serves as our hard temporal boundary to isolate historical data from the trade event.</li>



<li><strong><code>order_size</code>:</strong> The volume of shares being transacted. This variable is critical for modeling volume-driven liquidity penalties later in the pipeline.</li>



<li><strong><code>latency_seconds</code>:</strong> The time (in seconds) the LLM spent running its internal reasoning chains and debate loops. This is the master variable driving our time-based slippage penalty.</li>



<li><strong><code>base_fee_bps</code>:</strong> Fixed institutional transaction and clearing costs, measured in basis points (1 bp = 0.01%). It defaults to a standard institutional rate of 10 bps.</li>



<li><strong><code>ideal_rank</code> & <code>audited_rank</code>:</strong> Placeholders passed directly into the data matrix layer. <code>ideal_rank</code> maps the agent’s raw theoretical preference, while <code>audited_rank</code> identifies the asset’s real priority after market frictions are applied.</li>
</ul>



<h2 class="wp-block-heading">Part 3: Point-in-Time Control & Temporal Split Discipline</h2>



<p class="wp-block-paragraph">Now that our environment is ready, the function’s first critical task is to draw a strict line in time. It isolates historical data from the execution day data to ensure that future prices cannot leak into our calculations.</p>


<pre>
# 1. Point-in-Time Control & Temporal Split Discipline
  end_date &lt;- as.Date(trade_date)
  start_date &lt;- end_date - 45
  
  market_data &lt;- tq_get(ticker, from = start_date, to = end_date + 1)
  
  if (nrow(market_data) == 0) {
    stop(&quot;Audit Halted: Live data provenance check failed. Verify market calendar.&quot;)
  }
  
  execution_day_data &lt;- market_data %&gt;% filter(date == end_date)
  historical_series  &lt;- market_data %&gt;% filter(date &lt; end_date)
  
  if (nrow(execution_day_data) == 0) {
    stop(&quot;Audit Halted: Target trade date appears to be a market holiday/weekend.&quot;)
  }
  
  arrival_price &lt;- execution_day_data$open[1]
</pre>


<h3 class="wp-block-heading">Understanding the Internal Compliance Variables</h3>



<p class="wp-block-paragraph">To understand how this block enforces strict backtesting rules, let’s look at what each internal variable does:</p>



<ul class="wp-block-list">
<li><strong><code>end_date</code> & <code>start_date</code>:</strong> These variables convert the character <code>trade_date</code> into an R Date object and establish a rolling 45-day baseline window prior to the trade execution. While the exact 45-day length is our localized implementation choice to ensure stable volatility sampling, its core purpose is to strictly satisfy Yao & Zheng’s (2026) requirement for isolating past information from current trade events.</li>



<li><strong><code>market_data</code>:</strong> The raw data table downloaded via <code>tidyquant</code>. It fetches prices up to <code>end_date + 1</code> to ensure we capture the full trading session of our target date.</li>



<li><strong><code>historical_series</code>:</strong> A clean pricing array containing data strictly <em>before</em> the trade date. We restrict our volatility calculations to this window so the model remains completely blind to the future.</li>



<li><strong><code>execution_day_data</code>:</strong> Filters market activity down to the exact day of the trade. If this data frame turns up empty—meaning the agent tried to submit a trade on a weekend or a market holiday—the engine calls a hard <code>stop()</code> and terminates the run.</li>



<li><strong><code>arrival_price</code>:</strong> The stock’s <code>open</code> price on the execution day. This represents the pristine price available at the exact second the agent finishes its logic, serving as our baseline anchor before any market frictions are calculated.</li>
</ul>



<h2 class="wp-block-heading">Part 4: Mathematical Volatility & Timing Slippage Modeling</h2>



<p class="wp-block-paragraph">Once we have our clean data partitions, we scale the asset’s historical volatility down to a per-second level. This allows us to convert the agent’s cognitive delay directly into a financial price penalty.</p>


<pre>
# 2. Mathematical Volatility Modeling
  historical_vol &lt;- historical_series %&gt;%
    mutate(log_ret = log(close / lag(close))) %&gt;%
    summarise(vol = sd(log_ret, na.rm = TRUE) * sqrt(252)) %&gt;%
    pull(vol)
  
  volatility_per_second &lt;- (historical_vol / sqrt(252)) / 23400
  
  # 3. Execution Timing Latency (Timing Slippage)
  timing_slippage_dist &lt;- arrival_price * volatility_per_second * latency_seconds
  
  if (action == &quot;BUY&quot;) {
    execution_price &lt;- arrival_price + timing_slippage_dist
  } else if (action == &quot;SELL&quot;) {
    execution_price &lt;- arrival_price - timing_slippage_dist
  } else {
    stop(&quot;Audit Halted: Invalid execution semantics. Side must be BUY or SELL.&quot;)
  }
</pre>


<h3 class="wp-block-heading">Deconstructing the Mathematical Variables</h3>



<ul class="wp-block-list">
<li><strong><code>historical_vol</code>:</strong> The standard annualized volatility calculated from log returns. It represents the asset’s baseline speed of movement over a normal trading year.</li>



<li><strong><code>volatility_per_second</code>:</strong> This variable scales the annualized risk down to a single trading second. It divides the daily volatility by 23,400, which is the exact number of seconds in a standard 6.5-hour US market session (6.5 x 3600$).</li>



<li><strong><code>timing_slippage_dist</code>:</strong> The absolute dollar penalty caused by the agent’s delay. It multiplies our per-second volatility by <code>latency_seconds</code>.</li>



<li><strong><code>execution_price</code>:</strong> The real, degraded price our trade hits. If the action is <code>&quot;BUY&quot;</code>, the timing delay forces us to pay <em>more</em> (<code>arrival_price + timing_slippage_dist</code>). If the action is <code>&quot;SELL&quot;</code>, the delay forces us to sell for <em>less</em> (<code>arrival_price - timing_slippage_dist</code>).</li>
</ul>



<h2 class="wp-block-heading">Part 5: Institutional Friction & Turnover Cost Modeling</h2>



<p class="wp-block-paragraph">With the timing-degraded execution price established, the framework applies structural volume frictions. This step calculates fixed brokerage costs alongside non-linear market impact caused by our position size.</p>


<pre>
# 4. Institutional Friction & Turnover Cost Modeling (Volume Slippage)
  commission_cost     &lt;- execution_price * order_size * (base_fee_bps / 10000)
  liquidity_slippage  &lt;- execution_price * order_size * (order_size * 0.000001) 
  total_friction_cost &lt;- commission_cost + liquidity_slippage
  
  # Aggregating absolute slippage profiles for matrix visibility
  total_slippage_usd &lt;- (abs(execution_price - arrival_price) * order_size) + liquidity_slippage
  slippage_bps       &lt;- (total_slippage_usd / (arrival_price * order_size)) * 10000
</pre>


<h3 class="wp-block-heading">Deconstructing the Friction Variables</h3>



<ul class="wp-block-list">
<li><strong><code>commission_cost</code>:</strong> The baseline institutional clearing and exchange fee. It converts your fixed basis points (<code>base_fee_bps</code>) into a hard dollar cost based on the total value of the executed position.</li>



<li><strong><code>liquidity_slippage</code>:</strong> A non-linear market impact model. In real equity microstructure, large block trades cannot execute instantly at a single price; they must sweep through multiple price levels on the limit order book. The formula multiplying <code>order_size</code> by <code>0.000001</code> serves as our localized impact multiplier to penalize large trade volumes.</li>



<li><strong><code>total_friction_cost</code>:</strong> The sum of broker fees and physical market impact, representing the absolute overhead deducted from the position.</li>



<li><strong><code>total_slippage_usd</code>:</strong> The total dollar amount lost to market mechanics. It adds the money lost from the agent’s thinking delay (<code>abs(execution_price - arrival_price) * order_size</code>) to the money lost from sweeping the order book (<code>liquidity_slippage</code>).</li>



<li><strong><code>slippage_bps</code>:</strong> Standardizes the total dollar slippage back into basis points relative to the original intended position size. This allows us to compare execution damage cleanly across symbols with entirely different stock prices.</li>
</ul>



<h2 class="wp-block-heading">Part 6: Reproducibility Grading & Data Ingestion Matrix Output</h2>



<p class="wp-block-paragraph">Before returning any data, the function evaluates the structural integrity of its own audit parameters. It grades the calculation setup out of 100% to ensure the backtest is completely realistic, and then outputs a clean data row.</p>


<pre>
# 5. Reproducibility & Interpretability Score Evaluation
  reproducibility_score &lt;- 100
  if (liquidity_slippage == 0) reproducibility_score &lt;- reproducibility_score - 40
  if (base_fee_bps == 0)       reproducibility_score &lt;- reproducibility_score - 30
  
  evaluation_status &lt;- case_when(
    reproducibility_score &gt;= 85 ~ &quot;EXCELLENT / Economically Interpretable&quot;,
    reproducibility_score &gt;= 50 ~ &quot;PASS / Limited Realism&quot;,
    TRUE                         ~ &quot;FAIL / Methodological Illusion&quot;
  )
  
  # 6. Construct Raw Data Frame for gt Engine with exact mathematical parameters
  raw_matrix_df &lt;- tibble(
    Strategy      = paste0(&quot;Agent on &quot;, ticker),
    Ideal_Rank    = as.integer(ideal_rank),
    Audited_Rank  = as.integer(audited_rank),
    PIT_Control   = &quot;PASSED (Zero Look-Ahead)&quot;,
    Leakage_Guard = &quot;SECURE (Discipline Enforced)&quot;,
    Slip_BPs      = slippage_bps,
    Slip_USD      = total_slippage_usd,
    Friction_Mod  = paste0(&quot;Dynamic (&quot;, base_fee_bps, &quot; bps + Volume)&quot;),
    Turnover_Tr   = &quot;Penalized Alpha Decay&quot;,
    Latency_Mod   = paste0(&quot;Empirical Vol (&quot;, latency_seconds, &quot;s)&quot;),
    Score         = reproducibility_score,
    Status        = evaluation_status
  )
  
  return(raw_matrix_df)
}
</pre>


<h3 class="wp-block-heading">Understanding the Structural Matrix Variables</h3>



<ul class="wp-block-list">
<li><strong><code>reproducibility_score</code> & <code>evaluation_status</code>:</strong> A self-policing diagnostic mechanism. If a user tries to run a backtest with no fees or no volume penalties, the engine deducts points. A score below 50 flags the setup as a <code>Methodological Illusion</code>, warning you that the strategy looks profitable simply because it is ignoring real-world trading costs.</li>



<li><strong><code>raw_matrix_df</code>:</strong> The core data frame returned by the function. Notice that <code>Ideal_Rank</code> and <code>Audited_Rank</code> are forced into the data layer as standard integer variables. This ensures our portfolio analytics are handled strictly at the data layer before any styling or formatting takes place.</li>
</ul>



<h2 class="wp-block-heading">Part 7: High-Density Portfolio Execution Flow (The Simulation Sandbox)</h2>



<p class="wp-block-paragraph">Now that our core auditing function is defined, <strong>we need to build a simulation environment to stress-test it.</strong> In live trading, an investor relies on a priority ranking to decide capital allocation.</p>



<p class="wp-block-paragraph">To see exactly how cognitive latency disrupts this priority list, our script implements a <strong>Two-Pass Simulation Pipeline</strong> via <code>purrr::pmap_dfr</code>. Pass 1 runs a localized sweep to gather raw market frictions across a simulated portfolio, and Pass 2 injects those generated frictions back into the function to establish the final, adjusted priority order.</p>


<pre>
# ==============================================================================
# HIGH-DENSITY PORTFOLIO EXECUTION FLOW WITH STRUCTURAL RAW PARAMETERS
# ==============================================================================

# 1. Define ideal agent priority ranking inside map database
ideal_agent_ranks &lt;- tibble(
  ticker     = c(&quot;AMD&quot;, &quot;META&quot;, &quot;TSLA&quot;, &quot;MSFT&quot;, &quot;NFLX&quot;, &quot;GOOGL&quot;, &quot;NVDA&quot;, &quot;AAPL&quot;, &quot;AMZN&quot;, &quot;AVGO&quot;),
  Ideal_Rank = 1:10
)

# 2. Phase 1: Temporary execution execution mapping to capture raw slippage arrays
set.seed(42)
initial_inputs &lt;- tibble(
  ticker          = ideal_agent_ranks$ticker,
  action          = sample(c(&quot;BUY&quot;, &quot;SELL&quot;), nrow(ideal_agent_ranks), replace = TRUE, prob = c(0.6, 0.4)),
  trade_date      = &quot;2026-05-12&quot;,
  order_size      = 2500,
  latency_seconds = round(runif(nrow(ideal_agent_ranks), 3.5, 7.5), 1),
  base_fee_bps    = 10,
  ideal_rank      = ideal_agent_ranks$Ideal_Rank
)

# Run a localized sweep to compute absolute slippage values for explicit rank calculation
audited_ranks_map &lt;- pmap_dfr(initial_inputs, function(...) {
  args &lt;- list(...)
  audit_execution_assumptions(
    ticker          = args$ticker, 
    action          = args$action, 
    trade_date      = args$trade_date, 
    order_size      = args$order_size, 
    latency_seconds = args$latency_seconds, 
    base_fee_bps    = args$base_fee_bps,
    ideal_rank      = args$ideal_rank
  )
}) %&gt;%
  mutate(ticker = stringr::str_remove(Strategy, &quot;Agent on &quot;)) %&gt;%
  mutate(Calculated_Audited_Rank = min_rank(desc(Slip_BPs))) %&gt;%
  select(ticker, Calculated_Audited_Rank)

# 3. Phase 2: Inject both explicit ranks into the pipeline structure
portfolio_inputs &lt;- initial_inputs %&gt;%
  left_join(audited_ranks_map, by = &quot;ticker&quot;) %&gt;%
  rename(audited_rank = Calculated_Audited_Rank)

# 4. Generate final portfolio data matrix with dual ranking embedded in the raw layer
portfolio_matrix_df &lt;- pmap_dfr(portfolio_inputs, audit_execution_assumptions) %&gt;%
  mutate(Rank_Shift = Ideal_Rank - Audited_Rank) %&gt;%
  mutate(Ranking_Perturbation = paste0(&quot;Rank Decay: Node &quot;, Audited_Rank, &quot; (Shift: &quot;, Rank_Shift, &quot;)&quot;)) %&gt;%
  arrange(Audited_Rank)
</pre>


<h3 class="wp-block-heading">Deconstructing the Simulation Logic & Generated Variables</h3>



<p class="wp-block-paragraph">To keep things transparent, it is important to note that <strong>the code above does not represent a live execution engine; it is a synthetic playground</strong> built to show how the math behaves across a mock 10-stock universe:</p>



<ul class="wp-block-list">
<li><strong><code>ideal_agent_ranks</code>:</strong> This is our baseline control vector. It represents a mock scenario where an LLM agent has already ranked 10 stocks from best (<code>Ideal_Rank = 1</code> for AMD) to worst (<code>Ideal_Rank = 10</code> for AVGO) based purely on theoretical signals.</li>



<li><strong><code>initial_inputs</code> (The Environment Matrix):</strong> This table creates our simulated trade parameters. It forces every stock to trade an identical block of <code>2500</code> shares on a fixed historical date (<code>2026-05-12</code>). Crucially, we use <code>runif(..., 3.5, 7.5)</code> to <strong>simulate a random cognitive delay between 3.5 and 7.5 seconds</strong>—perfectly mimicking the time an LLM spends traversing multi-turn debate loops or long reasoning chains before hitting the market.</li>



<li><strong><code>audited_ranks_map</code> (The First Pass):</strong> This acts as our pre-trade exploratory sweep. Because we cannot rank the stocks by execution damage until we know what that damage is, this pass calls our function to calculate the raw absolute <code>Slip_BPs</code> for each asset. It then uses <code>min_rank(desc(Slip_BPs))</code> to generate <code>Calculated_Audited_Rank</code>—sorting the stocks based on how well they survived slippage.</li>



<li><strong><code>portfolio_inputs</code> & <code>portfolio_matrix_df</code> (The Second Pass):</strong> This forms our final consolidation loop. We combine our initial trade parameters with the newly simulated audited ranks using a standard <code>left_join</code>. Then, we run the auditing function one final time to bake both ranking layers cleanly into the final output.</li>



<li><strong><code>Rank_Shift</code> & <code>Ranking_Perturbation</code>:</strong> The ultimate diagnostic variables of our simulation. By subtracting the final audited position from the agent’s initial ideal position, these fields explicitly capture <strong>Rank Decay</strong>—showing the reader exactly how many slots an asset fell due to the toxic combination of its own volatility and the agent’s processing delay.</li>
</ul>



<h2 class="wp-block-heading">Part 8: The Professional Visualization Layer (Renderer)</h2>



<p class="wp-block-paragraph">With our data matrix fully computed inside the simulation sandbox, the final segment of our script passes the raw data frame directly into the <code>gt</code> visualization package. This block formats numbers, colors labels, and applies conditional logic to transform our raw tibble into the high-density corporate matrix seen in our audit results.</p>


<pre>
# ==============================================================================
# PROFESSIONAL VISUALIZATION LAYER (RENDERER)
# ==============================================================================
gt_audit_report &lt;- portfolio_matrix_df %&gt;%
  select(Strategy, Ideal_Rank, Audited_Rank, Ranking_Perturbation, PIT_Control, Leakage_Guard, 
         Slip_BPs, Slip_USD, Friction_Mod, Turnover_Tr, Latency_Mod, Score, Status) %&gt;%
  gt() %&gt;%
  tab_header(
    title = md(&quot;**Targeted Reproducibility & Execution Realism Matrix**&quot;),
    subtitle = paste0(&quot;Methodological Rigor Audit inspired by Yao & Zheng (2026) | Generated: &quot;, Sys.Date())
  ) %&gt;%
  cols_label(
    Strategy             = &quot;Audited LLM Strategy&quot;,
    Ideal_Rank           = &quot;Ideal Rank&quot;,
    Audited_Rank         = &quot;Audited Rank&quot;,
    Ranking_Perturbation = &quot;Ranking Perturbation&quot;,
    PIT_Control          = &quot;Point-in-Time Control&quot;,
    Leakage_Guard        = &quot;Data Leakage Guard&quot;,
    Slip_BPs             = &quot;Slippage (BPs)&quot;,
    Slip_USD             = &quot;Slippage (USD)&quot;,
    Friction_Mod         = &quot;Transaction-Cost Modeling&quot;,
    Turnover_Tr          = &quot;Turnover Treatment&quot;,
    Latency_Mod          = &quot;Execution Timing Latency&quot;,
    Score                = &quot;Rigor Score&quot;,
    Status               = &quot;Evaluation Status&quot;
  ) %&gt;%
  fmt_currency(columns = Slip_USD, currency = &quot;USD&quot;, decimals = 2) %&gt;%
  fmt_number(columns = Slip_BPs, decimals = 2) %&gt;%
  fmt_number(columns = c(Ideal_Rank, Audited_Rank), decimals = 0) %&gt;%
  fmt_number(columns = Score, decimals = 0, pattern = &quot;{x}%&quot;) %&gt;%
  tab_options(
    heading.title.font.size = px(18),
    heading.subtitle.font.size = px(13),
    column_labels.font.weight = &quot;bold&quot;,
    column_labels.background.color = &quot;#F4F6F7&quot;,
    table.font.names = &quot;Arial, sans-serif&quot;,
    data_row.padding = px(6),
    table.width = pct(100)
  ) %&gt;%
  tab_style(
    style = cell_text(color = &quot;#C0392B&quot;, weight = &quot;bold&quot;),
    locations = cells_body(columns = Ranking_Perturbation)
  ) %&gt;%
  tab_style(
    style = cell_text(color = &quot;#27AE60&quot;, weight = &quot;bold&quot;),
    locations = cells_body(columns = Status, rows = Score &gt;= 85)
  ) %&gt;%
  tab_style(
    style = cell_text(color = &quot;#C0392B&quot;, weight = &quot;bold&quot;),
    locations = cells_body(columns = Status, rows = Score &lt; 50)
  ) %&gt;%
  opt_row_striping()

# Display the multi-asset audited dashboard inside the RStudio Viewer pane
gt_audit_report
</pre>


<h3 class="wp-block-heading">Deconstructing the Presentation & Formatting Variables</h3>



<p class="wp-block-paragraph">The final rendering sequence leverages the <code>gt</code> package to map raw numerical matrices into a standardized institutional report. The formatting layer operates under strict visual rules to maximize data density and audit clarity:</p>



<ul class="wp-block-list">
<li><strong><code>cols_label()</code>:</strong> This function swaps out our machine-readable data names for human-friendly table headers. For example, it maps the raw variable <code>Slip_BPs</code> to <code>&quot;Slippage (BPs)&quot;</code> so institutional readers can scan the table without guessing what the column fields represent.</li>



<li><strong><code>fmt_currency()</code> & <code>fmt_number()</code>:</strong> These are our value formatters. They intercept raw floating-point numbers in the data frame and append standard financial currency tags (<code>$</code>) or trailing percentage signs (<code>%</code>) directly to the rendered output.</li>



<li><strong><code>tab_options()</code>:</strong> Controls the structural design and geometry of the table. It formats header font sizes, tightens row padding to increase information density, and sets a clean, professional background color (<code>#F4F6F7</code>) for the column header labels.</li>



<li><strong><code>tab_style()</code>:</strong> Enforces data-driven visual rules. It scans our data and automatically formats text color based on execution metrics:
<ul class="wp-block-list">
<li>It isolates the <code>Ranking_Perturbation</code> messages and renders them in bold crimson text to instantly draw focus to rank decay nodes.</li>



<li>It dynamically styles the <code>Status</code> column, turning rows green for secure runs (<code>Score &gt;= 85</code>) or red for unrealistic backtest assumptions (<code>Score &lt; 50</code>).</li>
</ul>
</li>



<li><strong><code>opt_row_striping()</code>:</strong> Generates alternating zebra striping across rows, allowing readers to track complex metrics across broad horizonal rows seamlessly.</li>
</ul>



<figure data-wp-context="{"imageId":"6a325cc3a2d7b"}" data-wp-interactive="core/image" data-wp-key="6a325cc3a2d7b" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" data-attachment-id="12237" data-permalink="https://datageeek.com/2026/06/17/auditing-llm-trading-bridging-theory-and-market-reality-with-the-gt-table-in-r/evidence_matrix/" data-orig-file="https://datageeek.com/wp-content/uploads/2026/06/evidence_matrix.png" data-orig-size="1907,710" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="evidence_matrix" data-image-description="" data-image-caption="" data-large-file="https://i0.wp.com/datageeek.com/wp-content/uploads/2026/06/evidence_matrix.png?w=450&#038;ssl=1" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i0.wp.com/datageeek.com/wp-content/uploads/2026/06/evidence_matrix.png?w=450&#038;ssl=1" alt="" class="wp-image-12237" srcset_temp="https://i0.wp.com/datageeek.com/wp-content/uploads/2026/06/evidence_matrix.png?w=450&#038;ssl=1 1024w, https://datageeek.com/wp-content/uploads/2026/06/evidence_matrix.png?w=150 150w, https://datageeek.com/wp-content/uploads/2026/06/evidence_matrix.png?w=300 300w, https://datageeek.com/wp-content/uploads/2026/06/evidence_matrix.png?w=768 768w, https://datageeek.com/wp-content/uploads/2026/06/evidence_matrix.png?w=1440 1440w, https://datageeek.com/wp-content/uploads/2026/06/evidence_matrix.png 1907w" sizes="(max-width: 1024px) 100vw, 1024px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<h2 class="wp-block-heading">Conclusion: Reclaiming Empirical Rigor</h2>



<p class="wp-block-paragraph">The output matrix generated by this R script proves a sobering fact: <strong>optimizing an LLM agent’s internal intelligence while ignoring its physical timing footprint is a zero-sum game.</strong> When cognitive latency meets volatile market microstructure, theoretical priority hierarchies collapse.</p>



<p class="wp-block-paragraph">By pushing dynamic slippage parameters directly into your research data layer rather than treats them as a post-trade footnote, you can accurately strip away laboratorial illusion. Quantitative researchers must stop asking how smart their financial agents are, and start measuring how fast those agents’ decisions decay on the trade desk.</p>



<p class="wp-block-paragraph"></p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://datageeek.com/2026/06/17/auditing-llm-trading-bridging-theory-and-market-reality-with-the-gt-table-in-r/"> DataGeeek</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/auditing-llm-trading-bridging-theory-and-market-reality-with-the-gt-table-in-r/">Auditing LLM Trading: Bridging Theory and Market Reality with the GT table in R</a>]]></content:encoded>
					
		
		<enclosure url="https://datageeek.com/wp-content/uploads/2026/06/datageeek-6a32534c9d982.png" length="0" type="" />
<enclosure url="https://1.gravatar.com/avatar/db5e3f9ef188ea98fe38ab05c5a3fad9fb52fe3472715a8fc02f7ea41731f77c?s=96&#038;d=identicon&#038;r=G" length="0" type="" />
<enclosure url="https://datageeek.com/wp-content/uploads/2026/06/evidence_matrix.png?w=1024" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">402068</post-id>	</item>
		<item>
		<title>Bioconductor-centric hackathon on spatial omics and image-derived data</title>
		<link>https://www.r-bloggers.com/2026/06/bioconductor-centric-hackathon-on-spatial-omics-and-image-derived-data/</link>
		
		<dc:creator><![CDATA[Davide Risso]]></dc:creator>
		<pubDate>Wed, 17 Jun 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://blog.bioconductor.org/posts/2026-06-17-venice/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>A Bioconductor-centric hackathon dedicated to spatial omics was organized by members of the Bioconductor community – Davide Risso (University of Padua, Italy), Helena Crowell (CNAG Barcelona, Spain), and Wolfgang Huber (EMBL) – on 19-22 April on...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/bioconductor-centric-hackathon-on-spatial-omics-and-image-derived-data/">Bioconductor-centric hackathon on spatial omics and image-derived data</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://blog.bioconductor.org/posts/2026-06-17-venice/"> Bioconductor community blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<p>A Bioconductor-centric hackathon dedicated to spatial omics was organized by members of the Bioconductor community – Davide Risso (University of Padua, Italy), Helena Crowell (CNAG Barcelona, Spain), and Wolfgang Huber (EMBL) – on <strong>19-22 April on San Servolo, Italy</strong>, an island off the coast of Venice, facing the Campanile of St. Mark’s Square.</p>
<p>The hackathon brought together <strong>27 researchers and software developers</strong> – from Germany, Switzerland, Italy, Spain, and the USA – to advance Bioconductor capabilities in spatial data handling and analysis, as well as the related topic of image analysis.</p>
<p>Participants were invited based on their experience with the hackathon’s research themes and software development, followed by an open call to the Bioconductor community (and beyond). The final group of participants included a mix of early-career and senior researchers, including two <a href="https://scverse.org/" rel="nofollow" target="_blank">scverse</a> members and one industry researcher, with a range of expertise in spatial omics, image analysis, and software development.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://i0.wp.com/blog.bioconductor.org/posts/2026-06-17-venice/terrace.jpeg?w=578&#038;ssl=1" class="img-fluid figure-img" data-recalc-dims="1"></p>
<figcaption>Picture time on a terrace overlooking St. Mark’s Square from San Servolo island. (Back:) Elisabeth Purdom, Wolfgang Huber, Pere Moles Serò, Rafael Irizarry, Helena Crowell, Martin Emons, Dario Righelli, Juan Henao, Sean Davis, Gabriele Sales, Mike Smith, Ilaria Billato, Patrick Danaher, Hugo Gruson, Carissa Chen, Daria Lazic, Luca Marconato, Artür Manukyan. (Front:) Davide Risso, Sviatoslav Kharuk, Michael Stadler, Samuel Gunz, Robert Castelo, Charlotte Soneson, Matteo Calgaro, Gabriel Grajeda, Riccardo Ceccaroni.</figcaption>
</figure>
</div>
<p>The hackathon centered on spatial omics and other bioimaging data, with emphasis on data representation, interoperable serialization, scalable data handling, Python interoperability, interactive visualization. The hackathon ran over three days with the majority of the time spent in teams who independently developed and implemented a plan that addressed a challenge or met a goal important to team members.</p>
<p>On the first day, the participants organized themselves into four major themes:</p>
<ul>
<li><strong>Spatially stratified differential expression analysis</strong><br>
(Matteo Calgaro, Robert Castelo, Patrick Danaher, Pere Moles Serò)</li>
<li><strong>Image and segmentation data manipulation and visualization</strong><br>
(Riccardo Ceccaroni, Carissa Chen, Davide Risso, Mike Smith)</li>
<li><strong>Infrastructure and interoperability of spatial data in Bioconductor</strong><br>
(Helena Crowell, Martin Emons, Gabriel Grajeda, Hugo Gruson, Samuel Gunz, Rafael Irizarry, Daria Lazic, Luca Marconato, Charlotte Soneson, Michael Stadler)</li>
<li><strong>Facilitating use of foundation models for the Bioconductor community</strong><br>
(Ilaria Billato, Juan Henao, Wolfgang Huber, Sviatoslav Kharuk, Artür Manukyan, Elisabeth Purdom, Dario Righelli, Gabriele Sales)</li>
</ul>
<p><img src="https://i0.wp.com/blog.bioconductor.org/posts/2026-06-17-venice/working.jpeg?w=578&#038;ssl=1" class="img-fluid" data-recalc-dims="1"></p>
<p>Each day started with a brief session in which each team set up goals for the day. Day 1 also included a single slide, five-minute <strong>project plan presentation</strong> right after lunch. This presentation mid-day served to help teams develop a focused project quickly, with the understanding that the project plan would likely change over the next 2 days.</p>
<p>Days 1 and 2 ended with the opportunity for each team to present their work and challenges they faced that day, again with a one-slide presentation. These <strong>daily afternoon summaries</strong> were helpful to identify shared challenges, crystallize work from the day, and to provide visibility across project teams.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://i1.wp.com/blog.bioconductor.org/posts/2026-06-17-venice/marco.jpeg?w=578&#038;ssl=1" class="img-fluid figure-img" data-recalc-dims="1"></p>
<figcaption>On the second day, the group journeyed across the water for a stroll through the streets of Venice towards Italian dinner. This group picture was taken on St. Mark’s Square (Piazza San Marco), featuring St. Mark’s Basilica and Campanile (bell tower) in the background.</figcaption>
</figure>
</div>
<p>The hackathon ended with a <strong>concluding showcase</strong> where each team presented their progress and demonstrated their technical achievements. To ensure these developments remain accessible to the community, teams documented their work (code, vignettes, and resources) in a dedicated <strong>GitHub repository</strong>. These results have been synthesized into a <strong>collaborative preprint</strong>, with each group contributing a detailed section summarizing their specific theme and findings.</p>
<ul>
<li><a href="https://github.com/BiocCodingCollaborations/VeniceHackathon2026" rel="nofollow" target="_blank">GitHub repository</a> housing code and resources developed during the hackathon.</li>
<li><a href="https://doi.org/10.37044/osf.io/9ej32_v1" rel="nofollow" target="_blank">Collaborative preprint</a> summarizing the format, themes, and outputs of the hackathon.</li>
</ul>
<hr>
<p>The event was organized by the Department of Statistical Sciences of the University of Padova in collaboration with EMBL and Venice International University, funded in part by the European Research Council (ERC) Grant CoG 101171662, and supported by EMBL’s Transversal Theme Theory@EMBL.</p>
<hr>



<p>
© 2026 Bioconductor. Content is published under <a href="https://creativecommons.org/licenses/by/4.0/" rel="nofollow" target="_blank">Creative Commons CC-BY-4.0 License</a> for the text and <a href="https://opensource.org/licenses/BSD-3-Clause" rel="nofollow" target="_blank">BSD 3-Clause License</a> for any code. | <a href="https://www.r-bloggers.com/" rel="nofollow" target="_blank">R-Bloggers</a>
</p> 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://blog.bioconductor.org/posts/2026-06-17-venice/"> Bioconductor community blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/bioconductor-centric-hackathon-on-spatial-omics-and-image-derived-data/">Bioconductor-centric hackathon on spatial omics and image-derived data</a>]]></content:encoded>
					
		
		<enclosure url="https://blog.bioconductor.org/posts/2026-06-17-venice/terrace.jpeg" length="0" type="image/jpeg" />

		<post-id xmlns="com-wordpress:feed-additions:1">402070</post-id>	</item>
		<item>
		<title>Bioconductor Maintainer Validation</title>
		<link>https://www.r-bloggers.com/2026/06/bioconductor-maintainer-validation/</link>
		
		<dc:creator><![CDATA[Lori Shepherd-Kern]]></dc:creator>
		<pubDate>Tue, 16 Jun 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://blog.bioconductor.org/posts/2026-06-16-maintainer-validation/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>Introduction<br />
Bioconductor policies include being an active and reachable maintainer. Maintainer emails in the DESCRIPTION of packages often go stale as maintainers change positions. There is also a necessity to have maintainers opt into Biocond...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/bioconductor-maintainer-validation/">Bioconductor Maintainer Validation</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://blog.bioconductor.org/posts/2026-06-16-maintainer-validation/"> Bioconductor community blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Bioconductor policies include being an active and reachable maintainer. Maintainer emails in the DESCRIPTION of packages often go stale as maintainers change positions. There is also a necessity to have maintainers opt into Bioconductor policies and procedures as they change over time.</p>
<p>We have created an application that uses Amazon Simple Email Service (SES) to send periodic emails to maintainers to check if the endpoint is reachable and to send a verification opt-in of Bioconductor current policies and procedures and code of conduct once a year.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://i1.wp.com/blog.bioconductor.org/posts/2026-06-16-maintainer-validation/MaintainerEmail.jpg?w=578&#038;ssl=1" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Photo of Email" data-recalc-dims="1"></p>
</figure>
</div>
<p>Initial feedback is that this email is “spammy” and may be marked as such by institutions, but it is an initial attempt at compliance. We will look at alternatives to emails like specialized maintainer account access at a future date.</p>
<section id="access-to-information" class="level4">
<h4 class="anchored" data-anchor-id="access-to-information">Access to Information</h4>
<p>The information is in a publicly accessible database. We do not recommend connecting directly to the webservice but instead using the accompanied Bioconductor R package <a href="https://bioconductor.org/packages/BiocMaintainerApp/" rel="nofollow" target="_blank">BiocMaintainerApp</a>. It provides a Shiny application interface for querying Bioconductor package maintainers’ information.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://i0.wp.com/blog.bioconductor.org/posts/2026-06-16-maintainer-validation/feature-image.jpg?w=578&#038;ssl=1" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Photo of ShinyApp" data-recalc-dims="1"></p>
</figure>
</div>
</section>
<section id="thank-you" class="level4">
<h4 class="anchored" data-anchor-id="thank-you">Thank you</h4>
<p>We appreciate maintainers’ cooperation moving forward.</p>


</section>
</section>

<p>
© 2026 Bioconductor. Content is published under <a href="https://creativecommons.org/licenses/by/4.0/" rel="nofollow" target="_blank">Creative Commons CC-BY-4.0 License</a> for the text and <a href="https://opensource.org/licenses/BSD-3-Clause" rel="nofollow" target="_blank">BSD 3-Clause License</a> for any code. | <a href="https://www.r-bloggers.com/" rel="nofollow" target="_blank">R-Bloggers</a>
</p> 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://blog.bioconductor.org/posts/2026-06-16-maintainer-validation/"> Bioconductor community blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/bioconductor-maintainer-validation/">Bioconductor Maintainer Validation</a>]]></content:encoded>
					
		
		<enclosure url="https://blog.bioconductor.org/posts/2026-06-16-maintainer-validation/feature-image.jpg" length="0" type="image/jpeg" />

		<post-id xmlns="com-wordpress:feed-additions:1">402032</post-id>	</item>
		<item>
		<title>New Package Submission Process</title>
		<link>https://www.r-bloggers.com/2026/06/new-package-submission-process/</link>
		
		<dc:creator><![CDATA[Lori Shepherd-Kern]]></dc:creator>
		<pubDate>Mon, 15 Jun 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://blog.bioconductor.org/posts/2026-06-15-new-submission-process-with-Runiverse/</guid>

					<description><![CDATA[<p>Introduction<br />
Bioconductor is moving towards using R-universe for its daily build system. See our previous blog post Collaborating between Bioconductor and R-universe on Development of Common Infrastructure. As we move in this direction it was a...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/new-package-submission-process/">New Package Submission Process</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://blog.bioconductor.org/posts/2026-06-15-new-submission-process-with-Runiverse/"> Bioconductor community blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Bioconductor is moving towards using R-universe for its daily build system. See our previous blog post <a href="https://blog.bioconductor.org/posts/2026-04-08-r-universe-collaboration/" rel="nofollow" target="_blank">Collaborating between Bioconductor and R-universe on Development of Common Infrastructure</a>. As we move in this direction it was also necessary to update the submission process for Bioconductor packages. While the daily builders are still transitioning, the new submission process location is now live. The new system utilizes GitHub Actions to trigger review milestones and R-Universe as the build/check backend. The new system provides a smoother experience; it is more automated and avoids administrative steps that have historically bottlenecked the review process.</p>
</section>
<section id="information" class="level2">
<h2 class="anchored" data-anchor-id="information">Information</h2>
<section id="location" class="level4">
<h4 class="anchored" data-anchor-id="location">Location:</h4>
<p>The new location for submitting new packages to Bioconductor for review is <a href="https://github.com/Bioconductor/BiocContributions" rel="nofollow" target="_blank">BiocContributions</a>. This replaces the old location at <code>Bioconductor/Contributions</code>.</p>
</section>
<section id="documentation" class="level4">
<h4 class="anchored" data-anchor-id="documentation">Documentation</h4>
<p>There is documentation on <a href="https://github.com/Bioconductor/BiocContributions/blob/devel/docs/submitters.md" rel="nofollow" target="_blank">What to Expect</a> as well as a detailed <a href="https://docs.google.com/presentation/d/1EK2wsDoRbtVGECdYC1GU5nGtYkN-h_7R-on-CSUC6CQ/edit?slide=id.p#slide=id.p" rel="nofollow" target="_blank">Slide Deck</a>.</p>
<p>There is also a <a href="https://github.com/Bioconductor/BiocContributions/blob/devel/docs/FAQs.md" rel="nofollow" target="_blank">FAQ</a> for commonly asked questions, concerns, or troubleshooting.</p>
<p>If you need to report an issue with the new system, please open an Issue on the <a href="https://github.com/BiocStaging/BiocSubmissionProcess" rel="nofollow" target="_blank">BiocSubmissionProcess</a> GitHub repository.</p>
</section>
<section id="what-about-bioconductorcontributions" class="level4">
<h4 class="anchored" data-anchor-id="what-about-bioconductorcontributions">What about Bioconductor/Contributions</h4>
<p>The submission location at <code>Bioconductor/Contributions</code> has been frozen and will no longer accept new issues. <a href="https://github.com/Bioconductor/BiocContributions" rel="nofollow" target="_blank">BiocContributions</a> replaces this location. If you already submitted to the old location, if you are assigned a reviewer, your review will finish there. If you have not been assigned a reviewer yet, we will be posting shortly to close out your submission and move to the new location.</p>
</section>
<section id="easier-reproducibility" class="level4">
<h4 class="anchored" data-anchor-id="easier-reproducibility">Easier Reproducibility</h4>
<p>One of the frequent comments we receive is how do we reproduce the results of the build reports Bioconductor creates. The switch to using R-universe as the building and checking backend allows for a reproducible testing environment. Any maintainer can apply R-Universe checking on their personal GitHub repository for a Bioconductor package by following these <a href="https://docs.r-universe.dev/bioconductor/#debugging-the-ci" rel="nofollow" target="_blank">instructions</a>. This allows for a maintainer to test before submitting to Bioconductor and testing any future changes before pushing directly to Bioconductor.</p>
</section>
<section id="thank-you" class="level4">
<h4 class="anchored" data-anchor-id="thank-you">Thank you!</h4>
<p>We appreciate your patience and understanding as we transition to the new system.</p>


</section>
</section>

<p>
© 2026 Bioconductor. Content is published under <a href="https://creativecommons.org/licenses/by/4.0/" rel="nofollow" target="_blank">Creative Commons CC-BY-4.0 License</a> for the text and <a href="https://opensource.org/licenses/BSD-3-Clause" rel="nofollow" target="_blank">BSD 3-Clause License</a> for any code. | <a href="https://www.r-bloggers.com/" rel="nofollow" target="_blank">R-Bloggers</a>
</p> 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://blog.bioconductor.org/posts/2026-06-15-new-submission-process-with-Runiverse/"> Bioconductor community blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/new-package-submission-process/">New Package Submission Process</a>]]></content:encoded>
					
		
		<enclosure url="https://blog.bioconductor.org/posts/2026-06-15-new-submission-process-with-Runiverse/featured-image.jpg" length="0" type="image/jpeg" />

		<post-id xmlns="com-wordpress:feed-additions:1">402034</post-id>	</item>
		<item>
		<title>Why we still need {admiral} in an age of AI</title>
		<link>https://www.r-bloggers.com/2026/06/why-we-still-need-admiral-in-an-age-of-ai/</link>
		
		<dc:creator><![CDATA[Jeff Dickinson]]></dc:creator>
		<pubDate>Sun, 14 Jun 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://pharmaverse.github.io/blog/posts/2026-06-14_admiral_in_age_of_ai/admiral_in_age_of_ai.html</guid>

					<description><![CDATA[<p>There is a version of the AI-in-pharma story that goes like this: LLMs are trained on vast amounts of R code, so they can write ADaM programs on demand. Packages like  admiral {admiral}  become optional — a style preference rather than a requi...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/why-we-still-need-admiral-in-an-age-of-ai/">Why we still need {admiral} in an age of AI</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://pharmaverse.github.io/blog/posts/2026-06-14_admiral_in_age_of_ai/admiral_in_age_of_ai.html"> pharmaverse blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<!--------------- typical setup ----------------->
<!--------------- post begins here ----------------->
<p>There is a version of the AI-in-pharma story that goes like this: LLMs are trained on vast amounts of R code, so they can write ADaM programs on demand. Packages like <bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip> become optional — a style preference rather than a requirement. Just describe what you need and let the model figure it out.</p>
<p>The benchmark data from <a href="https://github.com/RConsortium/pharma-skills" rel="nofollow" target="_blank">pharma-skills</a> tells a different story.</p>
<section id="what-an-unskilled-agent-actually-does" class="level2">
<h2 class="anchored" data-anchor-id="what-an-unskilled-agent-actually-does">What an Unskilled Agent Actually Does</h2>
<p>When an AI coding agent is asked to derive an ADAE dataset without access to <bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip> skill guidance, it does not reach for <code>derive_var_trtemfl()</code> or <code>derive_vars_merged()</code> with the correct parameters. Across multiple benchmark runs, unskilled agents fell into two consistent failure modes: either generating synthetic data rather than using pharmaverse reference datasets, or writing bespoke <code>dplyr</code> pipelines that reimplemented logic <bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip> already provides — incorrectly.</p>
<p>One example from BDS benchmarking is particularly telling. Without skill guidance, agents consistently used <code>derive_vars_merged()</code> where <code>derive_vars_merged_lookup()</code> was required for parameter code assignment. Both functions exist in <bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip>. Both execute without error. But <code>derive_vars_merged()</code> drops unmatched records silently, producing a dataset with the wrong row count. No warning. No crash. Just wrong output that passes a casual review.</p>
<p>This is not a model quality problem. It is a knowledge problem. The model does not know what the pharmaverse community knows.</p>
</section>
<section id="the-package-as-specification" class="level2">
<h2 class="anchored" data-anchor-id="the-package-as-specification">The Package as Specification</h2>
<p><bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip> is more than a collection of R functions. It is a community-maintained encoding of CDISC ADaM logic — accumulated through years of collaboration across sponsors, CROs, and regulators, tested against real submissions, and versioned for traceability. When a programmer calls <code>derive_vars_dtm()</code> with the correct imputation flags, they are not just writing R code. They are implementing a specification that has been reviewed, validated, and documented.</p>
<p>An LLM trained on general R code does not reliably inherit that specification. It has seen <bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip> in its training data, but not with the depth or precision needed to apply it correctly across the full range of ADaM derivation scenarios. The unskilled agent that wrote a custom <code>parse_dtc_datetime()</code> function using <code>substr()</code> and <code>as.POSIXct()</code> — rather than calling <code>derive_vars_dtm()</code> — was not being lazy. It was doing its best with what it knew. Its best was not good enough, and the errors it introduced were in the edge cases that matter most in a clinical submission.</p>
</section>
<section id="what-the-skill-does" class="level2">
<h2 class="anchored" data-anchor-id="what-the-skill-does">What the Skill Does</h2>
<p>The <bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip> skills in pharma-skills do not replace <bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip>. They connect the AI agent to it. A skill provides curated, domain-aware guidance: which functions to use for which derivations, how to structure the program for QC readability, which variables require special handling, and what assertions to include. The skill is the bridge between a capable general-purpose model and the specific, validated logic the pharmaverse community has built.</p>
<p>The benchmark results reflect this directly. Across ADSL, ADAE, ADVS, and ADLB:</p>
<ul>
<li><strong>With skill: 88–100%</strong> pass rates across domains</li>
<li><strong>Without skill: 17–59%</strong> pass rates, with high variance</li>
</ul>
<p>That variance in the unskilled condition matters as much as the mean. Inconsistent output is not a defensible process in a GxP context. A skill-guided agent produces consistent, traceable, <bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip>-anchored code. An unskilled agent produces something different every time.</p>
</section>
<section id="the-accountability-anchor" class="level2">
<h2 class="anchored" data-anchor-id="the-accountability-anchor">The Accountability Anchor</h2>
<p>There is a regulatory dimension here that goes beyond code quality. A clinical submission needs to trace its derivations to validated, versioned, documented methods. A bespoke LLM-generated pipeline — however functional — has no such anchor. <bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip> provides it. When a submission uses <code>derive_var_trtemfl()</code> from a pinned version of <bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip>, the derivation logic is documented, community-reviewed, and reproducible. The AI is most useful when it is writing code that inherits those properties, not when it is improvising around them.</p>
<p>This is why the pharma-skills project frames skills not as prompt templates, but as domain knowledge artifacts. The goal is not to make AI write more R code. It is to make AI write <bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip> code — correctly, consistently, and in a form that a human reviewer can audit and a regulatory submission can defend.</p>
<p><bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> <template>admiral</template> <a href="https://pharmaverse.github.io/admiral/" class="r-link-pkg" rel="nofollow" target="_blank">{admiral}</a> </bslib-tooltip> was built for exactly this moment. The community just needs to make sure AI knows how to use it.</p>
<!--------------- appendices go here ----------------->
</section>
<div class="cell">
<div class="cell-output-display">


</div>
</div>



<div id="quarto-appendix" class="default"><section id="last-updated" class="level2 appendix"><h2 class="anchored quarto-appendix-heading">Last updated</h2><div class="quarto-appendix-contents">

<p>2026-06-14 18:52:36.863336</p>
</div></section><section id="details" class="level2 appendix"><h2 class="anchored quarto-appendix-heading">Details</h2><div class="quarto-appendix-contents">

<p><a href="https://github.com/pharmaverse/blog/tree/main/posts/2026-06-14_admiral_in_age_of_ai/admiral_in_age_of_ai.qmd" rel="nofollow" target="_blank">Source</a>, <a href="https://pharmaverse.github.io/blog/session_info.html" rel="nofollow" target="_blank">Session info</a></p>
</div></section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="nofollow" href="https://creativecommons.org/licenses/by/4.0/" target="_blank">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre>@online{dickinson2026,
  author = {Dickinson, Jeff},
  title = {Why We Still Need \{Admiral\} in an Age of {AI}},
  date = {2026-06-14},
  url = {https://pharmaverse.github.io/blog/posts/2026-06-14_admiral_in_age_of_ai/admiral_in_age_of_ai.html},
  langid = {en}
}
</pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dickinson2026" class="csl-entry quarto-appendix-citeas">
Dickinson, Jeff. 2026. <span>“Why We Still Need {Admiral} in an Age of
AI.”</span> June 14, 2026. <a href="https://pharmaverse.github.io/blog/posts/2026-06-14_admiral_in_age_of_ai/admiral_in_age_of_ai.html" rel="nofollow" target="_blank">https://pharmaverse.github.io/blog/posts/2026-06-14_admiral_in_age_of_ai/admiral_in_age_of_ai.html</a>.
</div></div></section></div> 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://pharmaverse.github.io/blog/posts/2026-06-14_admiral_in_age_of_ai/admiral_in_age_of_ai.html"> pharmaverse blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/06/why-we-still-need-admiral-in-an-age-of-ai/">Why we still need {admiral} in an age of AI</a>]]></content:encoded>
					
		
		<enclosure url="https://pharmaverse.github.io/blog/posts/2026-06-14_admiral_in_age_of_ai/admiral.png" length="0" type="image/png" />

		<post-id xmlns="com-wordpress:feed-additions:1">401927</post-id>	</item>
	</channel>
</rss>
